Hi, we see machines getting stuck with a large number of backed-up processes that invoke statfs() (monitoring stuff), like: [Sat Apr 2 09:54:32 2022] INFO: task node_exporter:244222 blocked for more than 10 seconds. [Sat Apr 2 09:54:32 2022] Tainted: G O 5.15.26-cloudflare-2022.3.4 #1 [Sat Apr 2 09:54:32 2022] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Sat Apr 2 09:54:32 2022] task:node_exporter state:D stack: 0 pid:244222 ppid: 1 flags:0x00004000 [Sat Apr 2 09:54:32 2022] Call Trace: [Sat Apr 2 09:54:32 2022] <TASK> [Sat Apr 2 09:54:32 2022] __schedule+0x2cd/0x950 [Sat Apr 2 09:54:32 2022] schedule+0x44/0xa0 [Sat Apr 2 09:54:32 2022] schedule_timeout+0xfc/0x140 [Sat Apr 2 09:54:32 2022] ? try_to_wake_up+0x338/0x4e0 [Sat Apr 2 09:54:32 2022] ? __prepare_to_swait+0x4b/0x70 [Sat Apr 2 09:54:32 2022] wait_for_completion+0x86/0xe0 [Sat Apr 2 09:54:32 2022] flush_work+0x5c/0x80 [Sat Apr 2 09:54:32 2022] ? flush_workqueue_prep_pwqs+0x110/0x110 [Sat Apr 2 09:54:32 2022] xfs_inodegc_flush.part.0+0x3b/0x90 [Sat Apr 2 09:54:32 2022] xfs_fs_statfs+0x29/0x1c0 [Sat Apr 2 09:54:32 2022] statfs_by_dentry+0x4d/0x70 [Sat Apr 2 09:54:32 2022] user_statfs+0x57/0xc0 [Sat Apr 2 09:54:32 2022] __do_sys_statfs+0x20/0x50 [Sat Apr 2 09:54:32 2022] do_syscall_64+0x3b/0x90 [Sat Apr 2 09:54:32 2022] entry_SYSCALL_64_after_hwframe+0x44/0xae [Sat Apr 2 09:54:32 2022] RIP: 0033:0x4ac9db ... A linear-over-time increasing number of 'D' state processes is usually what alerts us to this. The oldest thread found waiting appears always to be the inode gc worker doing deferred inactivation: [Sat Apr 2 09:55:47 2022] INFO: task kworker/25:0:3880332 blocked for more than 85 seconds. [Sat Apr 2 09:55:47 2022] Tainted: G O 5.15.26-cloudflare-2022.3.4 #1 [Sat Apr 2 09:55:47 2022] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Sat Apr 2 09:55:47 2022] task:kworker/25:0 state:D stack: 0 pid:3880332 ppid: 2 flags:0x00004000 [Sat Apr 2 09:55:47 2022] Workqueue: xfs-inodegc/dm-5 xfs_inodegc_worker [Sat Apr 2 09:55:47 2022] Call Trace: [Sat Apr 2 09:55:47 2022] <TASK> [Sat Apr 2 09:55:47 2022] __schedule+0x2cd/0x950 [Sat Apr 2 09:55:47 2022] schedule+0x44/0xa0 [Sat Apr 2 09:55:47 2022] schedule_timeout+0xfc/0x140 [Sat Apr 2 09:55:47 2022] __down+0x7d/0xd0 [Sat Apr 2 09:55:47 2022] down+0x43/0x60 [Sat Apr 2 09:55:47 2022] xfs_buf_lock+0x29/0xa0 [Sat Apr 2 09:55:47 2022] xfs_buf_find+0x2c4/0x590 [Sat Apr 2 09:55:47 2022] xfs_buf_get_map+0x46/0x390 [Sat Apr 2 09:55:47 2022] xfs_buf_read_map+0x52/0x270 [Sat Apr 2 09:55:47 2022] ? xfs_read_agf+0x87/0x110 [Sat Apr 2 09:55:47 2022] xfs_trans_read_buf_map+0x128/0x2a0 [Sat Apr 2 09:55:47 2022] ? xfs_read_agf+0x87/0x110 [Sat Apr 2 09:55:47 2022] xfs_read_agf+0x87/0x110 [Sat Apr 2 09:55:47 2022] xfs_alloc_read_agf+0x34/0x1a0 [Sat Apr 2 09:55:47 2022] xfs_alloc_fix_freelist+0x3d7/0x4f0 [Sat Apr 2 09:55:47 2022] ? xfs_buf_find+0x2aa/0x590 [Sat Apr 2 09:55:47 2022] ? cpumask_next_and+0x1f/0x20 [Sat Apr 2 09:55:47 2022] ? update_sd_lb_stats.constprop.0+0xfe/0x7d0 [Sat Apr 2 09:55:47 2022] ? down_trylock+0x25/0x30 [Sat Apr 2 09:55:47 2022] xfs_alloc_vextent+0x22b/0x440 [Sat Apr 2 09:55:47 2022] __xfs_inobt_alloc_block.isra.0+0xc5/0x1a0 [Sat Apr 2 09:55:47 2022] __xfs_btree_split+0xf2/0x610 [Sat Apr 2 09:55:47 2022] ? xfs_btree_read_buf_block.constprop.0+0xae/0xd0 [Sat Apr 2 09:55:47 2022] xfs_btree_split+0x4b/0x100 [Sat Apr 2 09:55:47 2022] xfs_btree_make_block_unfull+0x193/0x1c0 [Sat Apr 2 09:55:47 2022] xfs_btree_insrec+0x4a9/0x5a0 [Sat Apr 2 09:55:47 2022] ? xfs_btree_read_buf_block.constprop.0+0x95/0xd0 [Sat Apr 2 09:55:47 2022] xfs_btree_insert+0xa8/0x1f0 [Sat Apr 2 09:55:47 2022] ? xfs_inobt_init_common+0x28/0x90 [Sat Apr 2 09:55:47 2022] xfs_difree_finobt+0xa4/0x240 [Sat Apr 2 09:55:47 2022] xfs_difree+0x126/0x1a0 [Sat Apr 2 09:55:47 2022] xfs_ifree+0xca/0x4a0 [Sat Apr 2 09:55:47 2022] ? xfs_trans_reserve+0x123/0x160 [Sat Apr 2 09:55:47 2022] ? xfs_trans_alloc+0xec/0x1f0 [Sat Apr 2 09:55:47 2022] xfs_inactive_ifree.isra.0+0x9e/0x1a0 [Sat Apr 2 09:55:47 2022] xfs_inactive+0xf8/0x170 [Sat Apr 2 09:55:47 2022] xfs_inodegc_worker+0x73/0xf0 [Sat Apr 2 09:55:47 2022] process_one_work+0x1e6/0x380 [Sat Apr 2 09:55:47 2022] worker_thread+0x50/0x3a0 [Sat Apr 2 09:55:47 2022] ? rescuer_thread+0x380/0x380 [Sat Apr 2 09:55:47 2022] kthread+0x127/0x150 [Sat Apr 2 09:55:47 2022] ? set_kthread_struct+0x40/0x40 [Sat Apr 2 09:55:47 2022] ret_from_fork+0x22/0x30 [Sat Apr 2 09:55:47 2022] </TASK> Once the system is in this state, I/O to the affected filesystem no longer happens - all ops to it get stuck. Other than rebooting the system, we have not found a way to recover from this. This is a histogram (first column: number of proceses 'D'-ed on that call trace) of `/proc/<PID>/stack`: 336 flush_work+0x5c/0x80 <= xfs_inodegc_flush.part.0+0x3b/0x90 <= xfs_fs_statfs+0x29/0x1c0 <= statfs_by_dentry+0x4d/0x70 <= user_statfs+0x57/0xc0 <= __do_sys_statfs+0x20/0x50 <= do_syscall_64+0x3b/0x90 <= entry_SYSCALL_64_after_hwframe+0x44/0xae 48 xfs_ilock_attr_map_shared+0x24/0x50 <= xfs_attr_get+0x96/0xc0 <= xfs_xattr_get+0x75/0xb0 <= __vfs_getxattr+0x53/0x70 <= get_vfs_caps_from_disk+0x6e/0x1d0 <= audit_copy_inode+0x95/0xd0 <= path_openat+0x27f/0x1090 <= do_filp_open+0xa9/0x150 <= do_sys_openat2+0x97/0x160 <= __x64_sys_openat+0x54/0x90 <= do_syscall_64+0x3b/0x90 <= entry_SYSCALL_64_after_hwframe+0x44/0xae 32 down+0x43/0x60 <= xfs_buf_lock+0x29/0xa0 <= xfs_buf_find+0x2c4/0x590 <= xfs_buf_get_map+0x46/0x390 <= xfs_buf_read_map+0x52/0x270 <= xfs_trans_read_buf_map+0x128/0x2a0 <= xfs_read_agi+0x8e/0x120 <= xfs_iunlink_remove+0x59/0x220 <= xfs_ifree+0x7d/0x4a0 <= xfs_inactive_ifree.isra.0+0x9e/0x1a0 <= xfs_inactive+0xf8/0x170 <= xfs_inodegc_worker+0x73/0xf0 <= process_one_work+0x1e6/0x380 <= worker_thread+0x50/0x3a0 <= kthread+0x127/0x150 <= ret_from_fork+0x22/0x30 5 down+0x43/0x60 <= xfs_buf_lock+0x29/0xa0 <= xfs_buf_find+0x2c4/0x590 <= xfs_buf_get_map+0x46/0x390 <= xfs_buf_read_map+0x52/0x270 <= xfs_trans_read_buf_map+0x128/0x2a0 <= xfs_read_agi+0x8e/0x120 <= xfs_ialloc_read_agi+0x26/0xa0 <= xfs_dialloc+0x171/0x720 <= xfs_create+0x394/0x610 <= xfs_generic_create+0x11e/0x360 <= path_openat+0xe1c/0x1090 <= do_filp_open+0xa9/0x150 <= do_sys_openat2+0x97/0x160 <= __x64_sys_openat+0x54/0x90 <= do_syscall_64+0x3b/0x90 <= entry_SYSCALL_64_after_hwframe+0x44/0xae 1 down+0x43/0x60 <= xfs_buf_lock+0x29/0xa0 <= xfs_buf_find+0x2c4/0x590 <= xfs_buf_get_map+0x46/0x390 <= xfs_buf_read_map+0x52/0x270 <= xfs_trans_read_buf_map+0x128/0x2a0 <= xfs_read_agi+0x8e/0x120 <= xfs_iunlink+0x6b/0x1c0 <= xfs_remove+0x2c5/0x410 <= xfs_vn_unlink+0x53/0xa0 <= vfs_unlink+0x113/0x280 <= do_unlinkat+0x1ab/0x2d0 <= __x64_sys_unlink+0x3e/0x60 <= do_syscall_64+0x3b/0x90 <= entry_SYSCALL_64_after_hwframe+0x44/0xae 1 down+0x43/0x60 <= xfs_buf_lock+0x29/0xa0 <= xfs_buf_find+0x2c4/0x590 <= xfs_buf_get_map+0x46/0x390 <= xfs_buf_read_map+0x52/0x270 <= xfs_trans_read_buf_map+0x128/0x2a0 <= xfs_read_agf+0x87/0x110 <= xfs_alloc_read_agf+0x34/0x1a0 <= xfs_alloc_fix_freelist+0x3d7/0x4f0 <= xfs_free_extent_fix_freelist+0x61/0xa0 <= __xfs_free_extent+0x6b/0x1a0 <= xfs_trans_free_extent+0x3b/0xd0 <= xfs_extent_free_finish_item+0x24/0x40 <= xfs_defer_finish_noroll+0x1f1/0x580 <= xfs_defer_finish+0x11/0x70 <= xfs_itruncate_extents_flags+0xc1/0x220 <= xfs_free_eofblocks+0xbc/0x120 <= xfs_icwalk_ag+0x3a9/0x640 <= xfs_blockgc_worker+0x31/0xf0 <= process_one_work+0x1e6/0x380 <= worker_thread+0x50/0x3a0 <= kthread+0x127/0x150 <= ret_from_fork+0x22/0x30 1 down+0x43/0x60 <= xfs_buf_lock+0x29/0xa0 <= xfs_buf_find+0x2c4/0x590 <= xfs_buf_get_map+0x46/0x390 <= xfs_buf_read_map+0x52/0x270 <= xfs_trans_read_buf_map+0x128/0x2a0 <= xfs_read_agf+0x87/0x110 <= xfs_alloc_read_agf+0x34/0x1a0 <= xfs_alloc_fix_freelist+0x3d7/0x4f0 <= xfs_alloc_vextent+0x22b/0x440 <= __xfs_inobt_alloc_block.isra.0+0xc5/0x1a0 <= __xfs_btree_split+0xf2/0x610 <= xfs_btree_split+0x4b/0x100 <= xfs_btree_make_block_unfull+0x193/0x1c0 <= xfs_btree_insrec+0x4a9/0x5a0 <= xfs_btree_insert+0xa8/0x1f0 <= xfs_difree_finobt+0xa4/0x240 <= xfs_difree+0x126/0x1a0 <= xfs_ifree+0xca/0x4a0 <= xfs_inactive_ifree.isra.0+0x9e/0x1a0 <= xfs_inactive+0xf8/0x170 <= xfs_inodegc_worker+0x73/0xf0 <= process_one_work+0x1e6/0x380 <= worker_thread+0x50/0x3a0 <= kthread+0x127/0x150 <= ret_from_fork+0x22/0x30 1 down+0x43/0x60 <= xfs_buf_lock+0x29/0xa0 <= xfs_buf_find+0x2c4/0x590 <= xfs_buf_get_map+0x46/0x390 <= xfs_buf_read_map+0x52/0x270 <= xfs_trans_read_buf_map+0x128/0x2a0 <= xfs_imap_to_bp+0x4e/0x70 <= xfs_trans_log_inode+0x1d0/0x280 <= xfs_bmap_btalloc+0x75f/0x820 <= xfs_bmapi_allocate+0xe4/0x310 <= xfs_bmapi_convert_delalloc+0x273/0x490 <= xfs_map_blocks+0x1b5/0x400 <= iomap_do_writepage+0x11d/0x820 <= write_cache_pages+0x189/0x3e0 <= iomap_writepages+0x1c/0x40 <= xfs_vm_writepages+0x71/0xa0 <= do_writepages+0xc3/0x1e0 <= __writeback_single_inode+0x37/0x270 <= writeback_sb_inodes+0x1ed/0x420 <= __writeback_inodes_wb+0x4c/0xd0 <= wb_writeback+0x1ba/0x270 <= wb_workfn+0x292/0x4d0 <= process_one_work+0x1e6/0x380 <= worker_thread+0x50/0x3a0 <= kthread+0x127/0x150 <= ret_from_fork+0x22/0x30 Is this something that has been seen before ? The codepath from xfs_inactive() through xfs_btree_split(), xfs_alloc_fix_freelist() to eventually xfs_buf_lock() doesn't look very new to me (an old report showing the callpath in a different context is, for example, https://www.spinics.net/lists/linux-mm/msg147472.html). But the introduction of deferred inode inactivation last Summer (in 5.15) - and this going through the xfs_inodegc_worker - appears to roughly correlate with us starting to see these hangs. There's no "advance warning" for these (no OOMs preceding, no disk full) as far as we've found. How would we go about addressing this hang ? Thanks in advance, Frank Hofmann