On Sun, Dec 24, 2023 at 11:39:05PM -0800, Daniel Dawson wrote: > On 12/22/23 12:48 PM, Carlos Carvalho wrote: > > This is finally a summary of a long standing problem. When lots of writes to > > many files are sent in a short time the kernel gets stuck and stops sending > > write requests to the disks. Sometimes it recovers and finally sends the > > modified pages to permanent storage, sometimes not and eventually other > > functions degrade and the machine crashes. > > > > A simple way to reproduce: expand a kernel source tree, like > > xzcat linux-6.5.tar.xz | tar x -f - > This sounds almost exactly like a problem I was having, right down to > triggering it by writing the files of a kernel tree, though the details in > my case are slightly different. I wanted to report it, but wanted to get a > better handle on it and never managed it, and now I've changed my setup such > that it doesn't happen anymore. > > - it happens only with ext4 on a parity raid array > > This is where it differs for me. I experienced it only with btrfs. But I had Hi Daniel, So I think there are some other people noticing something similar on btrfs as well [1]. Maybe this is related to the issue you are noticing although they have not mentioned anything about raid in btrfs. [1] https://bugzilla.redhat.com/show_bug.cgi?id=2242391 Regards, ojaswin > two arrays with it, one on SSDs and one on HDDs. The HDD array exhibited the > problem almost exclusively (the SSDs, I think, exhibited it once in several > months, while the HDDs did pretty much every time I tried to compile a new > kernel (until I started working around it), and even from some other things, > which was a couple of times a week). I imagine because HDDs much slower and > therefore allow more data to get cached. > > Now that I've switched the HDD array to ext4, I haven't experienced the > issue even once. But the setup has better performance, so maybe it's just > because it flushes its writes faster. > > -- > PGP fingerprint: 5BBD5080FEB0EF7F142F8173D572B791F7B4422A >