On Fri, May 20, 2022 at 11:56:06AM +0000, bugzilla-daemon@xxxxxxxxxx wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=216007 > > Bug ID: 216007 > Summary: XFS hangs in iowait when extracting large number of > files > Product: File System > Version: 2.5 > Kernel Version: 5.15.32 > Hardware: All > OS: Linux > Tree: Mainline > Status: NEW > Severity: normal > Priority: P1 > Component: XFS > Assignee: filesystem_xfs@xxxxxxxxxxxxxxxxxxxxxx > Reporter: bugzkernelorg8392@xxxxxxxxx > Regression: No > > Created attachment 301008 > --> https://bugzilla.kernel.org/attachment.cgi?id=301008&action=edit > output from dmesg after echo w > /proc/sysrq-trigger > > Overview: > > When I try to extract an uncompressed tar archive (2.6 milion files, 760.3 GiB > in size) on newly created (empty) XFS file system, after first low tens of > gigabytes extracted the process hangs in iowait indefinitely. One CPU core is > 100% occupied with iowait, the other CPU core is idle (on 2-core Intel Celeron > G1610T). > > I have kernel compiled with my .config file. When I try this with a more > "standard" kernel, the problem is not reproducible. > > Steps to Reproduce: > > 1) compile the kernel with the attached .config > > 2) reboot with this kernel > > 3) create a new XFS filesystem on a spare drive (just mkfs.xfs -f <dev>) > > 4) mount this new file system > > 5) try to extract large amount of data there > > Actual results: > > After 20-40 GiB written, the process hangs in iowait indefinitely, never > finishing the archive extraction. [ 805.233836] task:tar state:D stack: 0 pid: 2492 ppid: 2491 flags:0x00004000 [ 805.233840] Call Trace: [ 805.233841] <TASK> [ 805.233842] __schedule+0x1c9/0x510 [ 805.233846] ? lock_timer_base+0x5c/0x80 [ 805.233850] schedule+0x3f/0xa0 [ 805.233853] schedule_timeout+0x7c/0xf0 [ 805.233858] ? init_timer_key+0x30/0x30 [ 805.233862] io_schedule_timeout+0x47/0x70 [ 805.233866] congestion_wait+0x79/0xd0 [ 805.233872] ? wait_woken+0x60/0x60 [ 805.233876] xfs_buf_alloc_pages+0xd0/0x1b0 [ 805.233881] xfs_buf_get_map+0x259/0x300 [ 805.233886] ? xfs_buf_item_init+0x150/0x160 [ 805.233892] xfs_trans_get_buf_map+0xa9/0x120 [ 805.233897] xfs_ialloc_inode_init+0x129/0x2d0 [ 805.233901] ? xfs_ialloc_ag_alloc+0x1df/0x630 [ 805.233904] xfs_ialloc_ag_alloc+0x1df/0x630 [ 805.233908] xfs_dialloc+0x1b4/0x720 [ 805.233912] xfs_create+0x1d7/0x450 [ 805.233917] xfs_generic_create+0x114/0x2d0 [ 805.233922] path_openat+0x510/0xe10 [ 805.233925] do_filp_open+0xad/0x150 [ 805.233929] ? xfs_blockgc_clear_iflag+0x93/0xb0 [ 805.233932] ? xfs_iunlock+0x52/0x90 [ 805.233937] do_sys_openat2+0x91/0x150 [ 805.233942] __x64_sys_openat+0x4e/0x90 [ 805.233946] do_syscall_64+0x43/0x90 [ 805.233952] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 805.233959] RIP: 0033:0x7f763ccc9572 [ 805.233962] RSP: 002b:00007ffef1391530 EFLAGS: 00000246 ORIG_RAX: 0000000000000101 [ 805.233966] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f763ccc9572 [ 805.233969] RDX: 00000000000809c1 RSI: 000055b1d5b19270 RDI: 0000000000000004 [ 805.233971] RBP: 0000000000000180 R08: 000000000000c0c0 R09: 000055b1d5b145f0 [ 805.233973] R10: 0000000000000180 R11: 0000000000000246 R12: 0000000000000000 [ 805.233974] R13: 00000000000809c1 R14: 000055b1d5b19270 R15: 000055b1d59d2248 [ 805.233977] </TASK> It's waiting on memory allocation, which is probably waiting on IO completion somewhere to clean dirty pages. This suggests there's a problem with the storage hardware, the storage stack below XFS or there's an issue with memory cleaning/reclaim stalling and not making progress. > Expected Results: > > Archive extraction continues smoothly until done. > > Build Date & Hardware: > > 2022-05-01 on HP ProLiant MicroServer Gen8, 4GB ECC RAM > > Additional Information: > > No other filesystem tested with the same archive on the same hardware before or > after this (ext2, ext3, ext4, reiserfs3, jfs, nilfs2, f2fs, btrfs, zfs) has > shown this behavior. When I downgraded the kernel to 5.10.109, the XFS started > working again. Kernel versions higher than 5.15 seem to be affected, I tried > 5.17.1, 5.17.6 and 5.18.0-rc7, they all hang up after a few minutes. Doesn't actually look like an XFS problem from the evidence supplied, though. What sort of storage subsystem does this machine have? If it's a spinning disk then you've probably just filled memory > More could be found here: https://forums.gentoo.org/viewtopic-p-8709116.html Oh, wait: "I compiled a more mainstream version of sys-kernel/gentoo-sources-5.15.32-r1 (removed my .config file and let genkernel to fill it with default options) and lo and behold, in this kernel I could not make it go stuck anymore. [....] However, after I altered my old kernel config to contain these values and rebooting, I'm still triggering the bug. It may not be a XFS issue after all." >From the evidence presented, I'd agree that this doesn't look an XFS problem, either. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx