On Tue, Mar 15, 2022 at 03:07:10PM -0400, Brian Foster wrote: > What seems to happen is that the majority of the fsync calls end up > waiting on writeback of a particular page, the wakeup of the writeback > bit on that page wakes a task that immediately resets PG_writeback on > the page such that N other folio_wait_writeback() waiters see the bit > still set and immediately place themselves back onto the tail of the > wait queue. Meanwhile the waker task spins in the WQ_FLAG_BOOKMARK loop > in folio_wake_bit() (backing off the lock for a cycle or so in each > iteration) only to find the same bunch of tasks in the queue. This > process repeats for a long enough amount of time to trigger the soft > lockup warning. I've confirmed this spinning behavior with a tracepoint > in the bookmark loop that indicates we're stuck for many hundreds of > thousands of iterations (at least) of this loop when the soft lockup > warning triggers. [...] > I've run a few quick experiments to try and corroborate this analysis. > The problem goes away completely if I either back out the loop change in > folio_wait_writeback() or bump WAITQUEUE_WALK_BREAK_CNT to something > like 128 (i.e. greater than the total possible number of waiter tasks in > this test). I've also played a few games with bookmark behavior mostly > out of curiosity, but usually end up introducing other problems like > missed wakeups, etc. As I recall, the bookmark hack was introduced in order to handle lock_page() problems. It wasn't really supposed to handle writeback, but nobody thought it would cause any harm (and indeed, it didn't at the time). So how about we only use bookmarks for lock_page(), since lock_page() usually doesn't have the multiple-waker semantics that writeback has? (this is more in the spirit of "minimal patch" -- I think initialising the bookmark should be moved to folio_unlock()). diff --git a/mm/filemap.c b/mm/filemap.c index b2728eb52407..9ee3c5f1f489 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1146,26 +1146,28 @@ static int wake_page_function(wait_queue_entry_t *wait, unsigned mode, int sync, return (flags & WQ_FLAG_EXCLUSIVE) != 0; } -static void folio_wake_bit(struct folio *folio, int bit_nr) +static void folio_wake_bit(struct folio *folio, int bit_nr, + wait_queue_entry_t *bookmark) { wait_queue_head_t *q = folio_waitqueue(folio); struct wait_page_key key; unsigned long flags; - wait_queue_entry_t bookmark; key.folio = folio; key.bit_nr = bit_nr; key.page_match = 0; - bookmark.flags = 0; - bookmark.private = NULL; - bookmark.func = NULL; - INIT_LIST_HEAD(&bookmark.entry); + if (bookmark) { + bookmark->flags = 0; + bookmark->private = NULL; + bookmark->func = NULL; + INIT_LIST_HEAD(&bookmark->entry); + } spin_lock_irqsave(&q->lock, flags); - __wake_up_locked_key_bookmark(q, TASK_NORMAL, &key, &bookmark); + __wake_up_locked_key_bookmark(q, TASK_NORMAL, &key, bookmark); - while (bookmark.flags & WQ_FLAG_BOOKMARK) { + while (bookmark && (bookmark->flags & WQ_FLAG_BOOKMARK)) { /* * Take a breather from holding the lock, * allow pages that finish wake up asynchronously @@ -1175,7 +1177,7 @@ static void folio_wake_bit(struct folio *folio, int bit_nr) spin_unlock_irqrestore(&q->lock, flags); cpu_relax(); spin_lock_irqsave(&q->lock, flags); - __wake_up_locked_key_bookmark(q, TASK_NORMAL, &key, &bookmark); + __wake_up_locked_key_bookmark(q, TASK_NORMAL, &key, bookmark); } /* @@ -1204,7 +1206,7 @@ static void folio_wake(struct folio *folio, int bit) { if (!folio_test_waiters(folio)) return; - folio_wake_bit(folio, bit); + folio_wake_bit(folio, bit, NULL); } /* @@ -1554,12 +1556,15 @@ static inline bool clear_bit_unlock_is_negative_byte(long nr, volatile void *mem */ void folio_unlock(struct folio *folio) { + wait_queue_entry_t bookmark; + /* Bit 7 allows x86 to check the byte's sign bit */ BUILD_BUG_ON(PG_waiters != 7); BUILD_BUG_ON(PG_locked > 7); VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio); + if (clear_bit_unlock_is_negative_byte(PG_locked, folio_flags(folio, 0))) - folio_wake_bit(folio, PG_locked); + folio_wake_bit(folio, PG_locked, &bookmark); } EXPORT_SYMBOL(folio_unlock); @@ -1578,7 +1583,7 @@ void folio_end_private_2(struct folio *folio) { VM_BUG_ON_FOLIO(!folio_test_private_2(folio), folio); clear_bit_unlock(PG_private_2, folio_flags(folio, 0)); - folio_wake_bit(folio, PG_private_2); + folio_wake_bit(folio, PG_private_2, NULL); folio_put(folio); } EXPORT_SYMBOL(folio_end_private_2);