[PATCH] xfs: don't do inodgc work if task is exiting

Tycho Andersen <tycho@tycho.pizza> · Thu, 11 May 2023 09:17:02 -0600

From: Tycho Andersen <tandersen@xxxxxxxxxxx>

Similar to 5a8bee63b10f ("fuse: in fuse_flush only wait if someone wants
the return code"), we have a task that is stuck that can't be killed, so a
pid ns can't exit, wreaking all kinds of havoc:

INFO: task C2 CompilerThre:3546103 blocked for more than 1912 seconds.
      Tainted: G           OE     5.15.35netflix-g54efd87a8576 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:C2 CompilerThre state:D stack:    0 pid:3546103 ppid:3546047 flags:0x00004222
Call Trace:
 <TASK>
 __schedule+0x2c5/0x8d0
 schedule+0x3a/0xa0
 schedule_timeout+0x115/0x280
 ? __schedule+0x2cd/0x8d0
 wait_for_completion+0x9f/0x100
 __flush_work+0x161/0x1f0
 ? worker_detach_from_pool+0xb0/0xb0
 destroy_inode+0x3b/0x70
 __dentry_kill+0xcc/0x160
 dput+0x141/0x2e0
 ovl_destroy_inode+0x15/0x50 [overlay]
 destroy_inode+0x3b/0x70
 __dentry_kill+0xcc/0x160
 dput+0x141/0x2e0
 __fput+0xe1/0x250
 task_work_run+0x73/0xb0
 do_exit+0x37e/0xb80
 do_group_exit+0x3a/0xa0
 get_signal+0x140/0x870
 ? perf_event_groups_first+0x80/0x80
 arch_do_signal_or_restart+0xae/0x7c0
 ? __x64_sys_futex+0x5e/0x1d0
 ? __x64_sys_futex+0x5e/0x1d0
 exit_to_user_mode_prepare+0x10f/0x1c0
 syscall_exit_to_user_mode+0x26/0x40
 do_syscall_64+0x46/0xb0
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f3295cf3cf5
RSP: 002b:00007f327c834d00 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
RAX: fffffffffffffe00 RBX: 00007f32900bde50 RCX: 00007f3295cf3cf5
RDX: 0000000000000000 RSI: 0000000000000080 RDI: 00007f32900bde78
RBP: 00007f327c834dd0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f32900bde74
R13: 00007f32900bde78 R14: 00007f32900bde28 R15: 0000000000000000
 </TASK>

The bad call path is:

xfs_fs_destroy_inode() ->
  xfs_inode_mark_reclaimable ->
    xfs_inodegc_queue() ->
      xfs_inodegc_want_queue_work()
      xfs_inodegc_want_flush_work() ->
        flush_work() ->
          __flush_work() ->
            wait_for_completion()

We can avoid this task getting stuck by just not queuing the gc work from
do_exit().

The fact that there's a lockup at all probably indicative of another xfs
bug somewhere else that I am still looking for, but we should at least not
generate unkillable tasks as a result.

Signed-off-by: Tycho Andersen <tandersen@xxxxxxxxxxx>
CC: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx>
---
 fs/xfs/xfs_icache.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index 351849fc18ff..90e94d84f8ad 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -2011,6 +2011,9 @@ xfs_inodegc_want_queue_work(
  *
  * Note: If the current thread is running a transaction, we don't ever want to
  * wait for other transactions because that could introduce a deadlock.
+ * If the task is currently exiting there is nobody to wait for
+ * the flush and it can deadlock, so we should not try to flush in this case
+ * either.
  */
 static inline bool
 xfs_inodegc_want_flush_work(
@@ -2021,6 +2024,9 @@ xfs_inodegc_want_flush_work(
 	if (current->journal_info)
 		return false;
 
+	if (current->flags & PF_EXITING)
+		return false;
+
 	if (shrinker_hits > 0)
 		return true;
 

base-commit: 78b421b6a7c6dbb6a213877c742af52330f5026d
-- 
2.34.1