[PATCH 2/4] xfs: don't stall background reclaim on inactvation

"Darrick J. Wong" <djwong@xxxxxxxxxx> · Mon, 19 Dec 2022 16:05:08 -0800

From: Darrick J. Wong <djwong@xxxxxxxxxx>

The online fsck stress tests deadlocked a test VM the other night.  The
deadlock happened because:

1. kswapd tried to prune the sb inode list, xfs found that it needed to
inactivate an inode and that the queue was long enough that it should
wait for the worker.  It was holding shrinker_rwsem.

2. The inactivation worker allocated a transaction and then stalled
trying to obtain the AGI buffer lock.

3. An online repair function called unregister_shrinker while in
transaction context and holding the AGI lock.  It also tried to grab
shrinker_rwsem.

#3 shouldn't happen and was easily fixed, but seeing as we designed
background inodegc to avoid stalling reclaim, I feel that #1 shouldn't
be happening either.  Fix xfs_inodegc_want_flush_work to avoid stalling
background reclaim on inode inactivation.

Fixes: ab23a7768739 ("xfs: per-cpu deferred inode inactivation queues")
Signed-off-by: Darrick J. Wong <djwong@xxxxxxxxxx>
---
 fs/xfs/xfs_icache.c |    5 +++++
 1 file changed, 5 insertions(+)

diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index f35e2cee5265..24eff2bd4062 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -2000,6 +2000,8 @@ xfs_inodegc_want_queue_work(
  *
  * Note: If the current thread is running a transaction, we don't ever want to
  * wait for other transactions because that could introduce a deadlock.
+ *
+ * Don't let kswapd background reclamation stall on inactivations.
  */
 static inline bool
 xfs_inodegc_want_flush_work(
@@ -2010,6 +2012,9 @@ xfs_inodegc_want_flush_work(
 	if (current->journal_info)
 		return false;
 
+	if (current_is_kswapd())
+		return false;
+
 	if (shrinker_hits > 0)
 		return true;