Hi, small insight: even given my dataset that can reliably trigger this (after around 1.5 hours of rsyncing) it does not trigger on a specific set of files. I’ve deleted the data and started the rsync on a fresh directory (not a fresh filesystem, I can’t delete that as it carries important data) but it doesn’t always get stuck on the same files, even though rsync processes them in a repeatable order. I’m wondering how to generate more insights from that. Maybe keeping a blktrace log might help? It sounds like the specific pattern relies on XFS doing a specific thing there … Wild idea: maybe running the xfstest suite on an in-memory raid 6 setup could reproduce this? I’m guessing that the xfs people do not regularly run their test suite on a layered setup like mine with encryption and software raid? Christian > On 15. Aug 2024, at 08:19, Christian Theune <ct@xxxxxxxxxxxxxxx> wrote: > > Hi, > >> On 14. Aug 2024, at 10:53, Christian Theune <ct@xxxxxxxxxxxxxxx> wrote: >> >> Hi, >> >>> On 12. Aug 2024, at 20:37, John Stoffel <john@xxxxxxxxxxx> wrote: >>> >>> I'd probably just do the RAID6 tests first, get them out of the way. >> >> Alright, those are running right now - I’ll let you know what happens. > > I’m not making progress here. I can’t reproduce those on in-memory loopback raid 6. However: i can’t fully produce the rsync. For me this only triggered after around 1.5hs of progress on the NVMe which resulted in the hangup. I can only create around 20 GiB worth of raid 6 volume on this machine. I’ve tried running rsync until it exhausts the space, deleting the content and running rsync again, but I feel like this isn’t suffient to trigger the issue. :( > > I’m trying to find whether any specific pattern in the files around the time it locks up might be relevant here and try to run the rsync over that > portion. > > On the plus side, I have a script now that can create the various loopback settings quickly, so I can try out things as needed. Not that valuable without a reproducer, yet, though. > > @Yu: you mentioned that you might be able to provide me a kernel that produces more error logging to diagnose this? Any chance we could try that route? > > Christian > > -- > Christian Theune · ct@xxxxxxxxxxxxxxx · +49 345 219401 0 > Flying Circus Internet Operations GmbH · https://flyingcircus.io > Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland > HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian Zagrodnick Liebe Grüße, Christian Theune -- Christian Theune · ct@xxxxxxxxxxxxxxx · +49 345 219401 0 Flying Circus Internet Operations GmbH · https://flyingcircus.io Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian Zagrodnick