From: Tycho Andersen <tandersen@xxxxxxxxxxx> Similar to 5a8bee63b10f ("fuse: in fuse_flush only wait if someone wants the return code"), we have a task that is stuck that can't be killed, so a pid ns can't exit, wreaking all kinds of havoc: INFO: task C2 CompilerThre:3546103 blocked for more than 1912 seconds. Tainted: G OE 5.15.35netflix-g54efd87a8576 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:C2 CompilerThre state:D stack: 0 pid:3546103 ppid:3546047 flags:0x00004222 Call Trace: <TASK> __schedule+0x2c5/0x8d0 schedule+0x3a/0xa0 schedule_timeout+0x115/0x280 ? __schedule+0x2cd/0x8d0 wait_for_completion+0x9f/0x100 __flush_work+0x161/0x1f0 ? worker_detach_from_pool+0xb0/0xb0 destroy_inode+0x3b/0x70 __dentry_kill+0xcc/0x160 dput+0x141/0x2e0 ovl_destroy_inode+0x15/0x50 [overlay] destroy_inode+0x3b/0x70 __dentry_kill+0xcc/0x160 dput+0x141/0x2e0 __fput+0xe1/0x250 task_work_run+0x73/0xb0 do_exit+0x37e/0xb80 do_group_exit+0x3a/0xa0 get_signal+0x140/0x870 ? perf_event_groups_first+0x80/0x80 arch_do_signal_or_restart+0xae/0x7c0 ? __x64_sys_futex+0x5e/0x1d0 ? __x64_sys_futex+0x5e/0x1d0 exit_to_user_mode_prepare+0x10f/0x1c0 syscall_exit_to_user_mode+0x26/0x40 do_syscall_64+0x46/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f3295cf3cf5 RSP: 002b:00007f327c834d00 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca RAX: fffffffffffffe00 RBX: 00007f32900bde50 RCX: 00007f3295cf3cf5 RDX: 0000000000000000 RSI: 0000000000000080 RDI: 00007f32900bde78 RBP: 00007f327c834dd0 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007f32900bde74 R13: 00007f32900bde78 R14: 00007f32900bde28 R15: 0000000000000000 </TASK> The bad call path is: xfs_fs_destroy_inode() -> xfs_inode_mark_reclaimable -> xfs_inodegc_queue() -> xfs_inodegc_want_queue_work() xfs_inodegc_want_flush_work() -> flush_work() -> __flush_work() -> wait_for_completion() We can avoid this task getting stuck by just not queuing the gc work from do_exit(). The fact that there's a lockup at all probably indicative of another xfs bug somewhere else that I am still looking for, but we should at least not generate unkillable tasks as a result. Signed-off-by: Tycho Andersen <tandersen@xxxxxxxxxxx> CC: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx> --- fs/xfs/xfs_icache.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c index 351849fc18ff..90e94d84f8ad 100644 --- a/fs/xfs/xfs_icache.c +++ b/fs/xfs/xfs_icache.c @@ -2011,6 +2011,9 @@ xfs_inodegc_want_queue_work( * * Note: If the current thread is running a transaction, we don't ever want to * wait for other transactions because that could introduce a deadlock. + * If the task is currently exiting there is nobody to wait for + * the flush and it can deadlock, so we should not try to flush in this case + * either. */ static inline bool xfs_inodegc_want_flush_work( @@ -2021,6 +2024,9 @@ xfs_inodegc_want_flush_work( if (current->journal_info) return false; + if (current->flags & PF_EXITING) + return false; + if (shrinker_hits > 0) return true; base-commit: 78b421b6a7c6dbb6a213877c742af52330f5026d -- 2.34.1