On 23/12/23 07:48, Carlos Carvalho wrote:
This is finally a summary of a long standing problem. When lots of writes to many files are sent in a short time the kernel gets stuck and stops sending write requests to the disks. Sometimes it recovers and finally sends the modified pages to permanent storage, sometimes not and eventually other functions degrade and the machine crashes. A simple way to reproduce: expand a kernel source tree, like xzcat linux-6.5.tar.xz | tar x -f - With the default vm settings for dirty_background_ratio and dirty_ratio this will finish quickly with ~1.5GB of dirty pages in ram and ~100k inodes to be written and the kernel gets stuck. The bug exists in all 6.* kernels; I've tested the latest release of all 6.[1-6]. However some conditions must exist for the problem to appear: - there must be many inodes to be flushed; just many bytes in a few files don't show the problem - it happens only with ext4 on a parity raid array
This may be unrelated but there is an open problem that looks somewhat similar. It is tracked at https://bugzilla.kernel.org/show_bug.cgi?id=217965 If your fs is mounted with a non-zero 'stripe=' (as RAID arrays usually are), try to get around the issue with $ sudo mount -o remount,stripe=0 YourFS If it makes a difference then you may be looking at a similar issue.
I've moved one of our arrays to xfs and everything works fine, so it's either specific to ext4 or xfs is not affected. When the lockup happens the flush kworker starts using 100% cpu permanently. I have not observed the bug in raid10, only in raid[56]. The problem is more easily triggered with 6.[56] but 6.1 is also affected.
The issue was seen in kernels 6.5 and later but not in 6.4, so maybe not the same thing.
Limiting dirty_bytes and dirty_background_bytes to low values reduce the probability of lockup, probably because the process generating writes is stopped before too many files are created.
HTH -- Eyal at Home (eyal@xxxxxxxxxxxxxx)