Re: writeback completion soft lockup BUG in folio_wake_bit()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Mar 15, 2022 at 03:07:10PM -0400, Brian Foster wrote:
> What seems to happen is that the majority of the fsync calls end up
> waiting on writeback of a particular page, the wakeup of the writeback
> bit on that page wakes a task that immediately resets PG_writeback on
> the page such that N other folio_wait_writeback() waiters see the bit
> still set and immediately place themselves back onto the tail of the
> wait queue.  Meanwhile the waker task spins in the WQ_FLAG_BOOKMARK loop
> in folio_wake_bit() (backing off the lock for a cycle or so in each
> iteration) only to find the same bunch of tasks in the queue. This
> process repeats for a long enough amount of time to trigger the soft
> lockup warning. I've confirmed this spinning behavior with a tracepoint
> in the bookmark loop that indicates we're stuck for many hundreds of
> thousands of iterations (at least) of this loop when the soft lockup
> warning triggers.

[...]

> I've run a few quick experiments to try and corroborate this analysis.
> The problem goes away completely if I either back out the loop change in
> folio_wait_writeback() or bump WAITQUEUE_WALK_BREAK_CNT to something
> like 128 (i.e. greater than the total possible number of waiter tasks in
> this test). I've also played a few games with bookmark behavior mostly
> out of curiosity, but usually end up introducing other problems like
> missed wakeups, etc.

As I recall, the bookmark hack was introduced in order to handle
lock_page() problems.  It wasn't really supposed to handle writeback,
but nobody thought it would cause any harm (and indeed, it didn't at the
time).  So how about we only use bookmarks for lock_page(), since
lock_page() usually doesn't have the multiple-waker semantics that
writeback has?

(this is more in the spirit of "minimal patch" -- I think initialising
the bookmark should be moved to folio_unlock()).

diff --git a/mm/filemap.c b/mm/filemap.c
index b2728eb52407..9ee3c5f1f489 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1146,26 +1146,28 @@ static int wake_page_function(wait_queue_entry_t *wait, unsigned mode, int sync,
 	return (flags & WQ_FLAG_EXCLUSIVE) != 0;
 }
 
-static void folio_wake_bit(struct folio *folio, int bit_nr)
+static void folio_wake_bit(struct folio *folio, int bit_nr,
+		wait_queue_entry_t *bookmark)
 {
 	wait_queue_head_t *q = folio_waitqueue(folio);
 	struct wait_page_key key;
 	unsigned long flags;
-	wait_queue_entry_t bookmark;
 
 	key.folio = folio;
 	key.bit_nr = bit_nr;
 	key.page_match = 0;
 
-	bookmark.flags = 0;
-	bookmark.private = NULL;
-	bookmark.func = NULL;
-	INIT_LIST_HEAD(&bookmark.entry);
+	if (bookmark) {
+		bookmark->flags = 0;
+		bookmark->private = NULL;
+		bookmark->func = NULL;
+		INIT_LIST_HEAD(&bookmark->entry);
+	}
 
 	spin_lock_irqsave(&q->lock, flags);
-	__wake_up_locked_key_bookmark(q, TASK_NORMAL, &key, &bookmark);
+	__wake_up_locked_key_bookmark(q, TASK_NORMAL, &key, bookmark);
 
-	while (bookmark.flags & WQ_FLAG_BOOKMARK) {
+	while (bookmark && (bookmark->flags & WQ_FLAG_BOOKMARK)) {
 		/*
 		 * Take a breather from holding the lock,
 		 * allow pages that finish wake up asynchronously
@@ -1175,7 +1177,7 @@ static void folio_wake_bit(struct folio *folio, int bit_nr)
 		spin_unlock_irqrestore(&q->lock, flags);
 		cpu_relax();
 		spin_lock_irqsave(&q->lock, flags);
-		__wake_up_locked_key_bookmark(q, TASK_NORMAL, &key, &bookmark);
+		__wake_up_locked_key_bookmark(q, TASK_NORMAL, &key, bookmark);
 	}
 
 	/*
@@ -1204,7 +1206,7 @@ static void folio_wake(struct folio *folio, int bit)
 {
 	if (!folio_test_waiters(folio))
 		return;
-	folio_wake_bit(folio, bit);
+	folio_wake_bit(folio, bit, NULL);
 }
 
 /*
@@ -1554,12 +1556,15 @@ static inline bool clear_bit_unlock_is_negative_byte(long nr, volatile void *mem
  */
 void folio_unlock(struct folio *folio)
 {
+	wait_queue_entry_t bookmark;
+
 	/* Bit 7 allows x86 to check the byte's sign bit */
 	BUILD_BUG_ON(PG_waiters != 7);
 	BUILD_BUG_ON(PG_locked > 7);
 	VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio);
+
 	if (clear_bit_unlock_is_negative_byte(PG_locked, folio_flags(folio, 0)))
-		folio_wake_bit(folio, PG_locked);
+		folio_wake_bit(folio, PG_locked, &bookmark);
 }
 EXPORT_SYMBOL(folio_unlock);
 
@@ -1578,7 +1583,7 @@ void folio_end_private_2(struct folio *folio)
 {
 	VM_BUG_ON_FOLIO(!folio_test_private_2(folio), folio);
 	clear_bit_unlock(PG_private_2, folio_flags(folio, 0));
-	folio_wake_bit(folio, PG_private_2);
+	folio_wake_bit(folio, PG_private_2, NULL);
 	folio_put(folio);
 }
 EXPORT_SYMBOL(folio_end_private_2);



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux