Hello,
I have a puzzling problem with XFS on Debian 10. I am running
number-crunching driven by Node.js - I have a process that creates about
2 million 1MB to 5MB files per day with an about 24h lifespan (weather
forecasting). The file system is obviously heavily fragmented. I have
absolutely no problems when running this in cruise mode, but every time
I decide to stop that process, especially when it has been running for a
few weeks or months, the process will become a zombie (freeing all its
user memory and file descriptors) and then xfsaild/kworker will continue
flushing the log for about 30-45 minutes before the process really
quits. It will keep its binds to network ports (which is my main
problem) but the system will remain responsive and usable. The I/O
pattern is several seconds of random reading then a second or two of
sequential writing.
The kernel functions that are running in the zombie process context are
mainly xfs_btree_lookup, xfs_log_commit_cil, xfs_next_bit,
xfs_buf_find_isra.26
xfsaild is spending time in radix_tree_next_chunk, xfs_inode_buf_verify
kworker is in xfs_reclaim_inode, radix_tree_next_chunk
This is on (standard up-to date Debian 10):
Linux version 4.19.0-16-amd64 (debian-kernel@xxxxxxxxxxxxxxxx) (gcc
version 8.3.0 (Debian 8.3.0-6)) #1 SMP Debian 4.19.181-1 (2021-03-19)
xfs_progs 4.20.0-1
File system is RAID-0, 2x2TB disks with LVM over md (512k chunks)
meta-data=/dev/mapper/vg0-home isize=512 agcount=32,
agsize=29849728 blks
= sectsz=4096 attr=2, projid32bit=1
= crc=1 finobt=1, sparse=1, rmapbt=0
= reflink=0
data = bsize=4096 blocks=955191296, imaxpct=5
= sunit=128 swidth=256 blks
naming =version 2 bsize=4096 ascii-ci=0, ftype=1
log =internal log bsize=4096 blocks=466402, version=2
= sectsz=4096 sunit=1 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
MemTotal: 32800968 kB
MemFree: 759308 kB
MemAvailable: 27941208 kB
Buffers: 43900 kB
Cached: 26504332 kB
SwapCached: 7560 kB
Active: 16101380 kB
Inactive: 11488252 kB
Active(anon): 813424 kB
Inactive(anon): 228180 kB
Active(file): 15287956 kB
Inactive(file): 11260072 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 16777212 kB
SwapFree: 16715524 kB
Dirty: 2228 kB
Writeback: 0 kB
AnonPages: 1034280 kB
Mapped: 89660 kB
Shmem: 188 kB
Slab: 1508868 kB
SReclaimable: 1097804 kB
SUnreclaim: 411064 kB
KernelStack: 3792 kB
PageTables: 5872 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 33177696 kB
Committed_AS: 1394296 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 0 kB
VmallocChunk: 0 kB
Percpu: 7776 kB
HardwareCorrupted: 0 kB
AnonHugePages: 215040 kB
ShmemHugePages: 0 kB
ShmemPmdMapped: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
Hugetlb: 0 kB
DirectMap4k: 11682188 kB
DirectMap2M: 21731328 kB
DirectMap1G: 1048576 kB
--
Momtchil Momtchev <momtchil@xxxxxxxxxxxx>