Re: [PATCH 5.10 109/145] mm: make wait_on_page_writeback() wait for multiple pending writebacks

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 11 Jan 2021, Greg Kroah-Hartman wrote:

> From: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
> 
> commit c2407cf7d22d0c0d94cf20342b3b8f06f1d904e7 upstream.
> 
> Ever since commit 2a9127fcf229 ("mm: rewrite wait_on_page_bit_common()
> logic") we've had some very occasional reports of BUG_ON(PageWriteback)
> in write_cache_pages(), which we thought we already fixed in commit
> 073861ed77b6 ("mm: fix VM_BUG_ON(PageTail) and BUG_ON(PageWriteback)").
> 
> But syzbot just reported another one, even with that commit in place.
> 
> And it turns out that there's a simpler way to trigger the BUG_ON() than
> the one Hugh found with page re-use.  It all boils down to the fact that
> the page writeback is ostensibly serialized by the page lock, but that
> isn't actually really true.
> 
> Yes, the people _setting_ writeback all do so under the page lock, but
> the actual clearing of the bit - and waking up any waiters - happens
> without any page lock.
> 
> This gives us this fairly simple race condition:
> 
>   CPU1 = end previous writeback
>   CPU2 = start new writeback under page lock
>   CPU3 = write_cache_pages()
> 
>   CPU1          CPU2            CPU3
>   ----          ----            ----
> 
>   end_page_writeback()
>     test_clear_page_writeback(page)
>     ... delayed...
> 
>                 lock_page();
>                 set_page_writeback()
>                 unlock_page()
> 
>                                 lock_page()
>                                 wait_on_page_writeback();
> 
>     wake_up_page(page, PG_writeback);
>     .. wakes up CPU3 ..
> 
>                                 BUG_ON(PageWriteback(page));
> 
> where the BUG_ON() happens because we woke up the PG_writeback bit
> becasue of the _previous_ writeback, but a new one had already been
> started because the clearing of the bit wasn't actually atomic wrt the
> actual wakeup or serialized by the page lock.
> 
> The reason this didn't use to happen was that the old logic in waiting
> on a page bit would just loop if it ever saw the bit set again.
> 
> The nice proper fix would probably be to get rid of the whole "wait for
> writeback to clear, and then set it" logic in the writeback path, and
> replace it with an atomic "wait-to-set" (ie the same as we have for page
> locking: we set the page lock bit with a single "lock_page()", not with
> "wait for lock bit to clear and then set it").
> 
> However, out current model for writeback is that the waiting for the
> writeback bit is done by the generic VFS code (ie write_cache_pages()),
> but the actual setting of the writeback bit is done much later by the
> filesystem ".writepages()" function.
> 
> IOW, to make the writeback bit have that same kind of "wait-to-set"
> behavior as we have for page locking, we'd have to change our roughly
> ~50 different writeback functions.  Painful.
> 
> Instead, just make "wait_on_page_writeback()" loop on the very unlikely
> situation that the PG_writeback bit is still set, basically re-instating
> the old behavior.  This is very non-optimal in case of contention, but
> since we only ever set the bit under the page lock, that situation is
> controlled.
> 
> Reported-by: syzbot+2fc0712f8f8b8b8fa0ef@xxxxxxxxxxxxxxxxxxxxxxxxx
> Fixes: 2a9127fcf229 ("mm: rewrite wait_on_page_bit_common() logic")
> Acked-by: Hugh Dickins <hughd@xxxxxxxxxx>
> Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
> Cc: Matthew Wilcox <willy@xxxxxxxxxxxxx>
> Cc: stable@xxxxxxxxxx
> Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
> Signed-off-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>

I think it's too early to push this one through to stable:
Linus mentioned on Friday that Michael Larabel of Phoronix
has observed a performance regression from this commit.

Correctness outweighs performance of course, but I think
stable users might see the performance issue much sooner
than they would ever see the BUG fixed.  Wait a bit,
while we think some more about what to try next?

Hugh

> 
> ---
>  mm/page-writeback.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -2826,7 +2826,7 @@ EXPORT_SYMBOL(__test_set_page_writeback)
>   */
>  void wait_on_page_writeback(struct page *page)
>  {
> -	if (PageWriteback(page)) {
> +	while (PageWriteback(page)) {
>  		trace_wait_on_page_writeback(page, page_mapping(page));
>  		wait_on_page_bit(page, PG_writeback);
>  	}



[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux