Hi, > On 15. Aug 2024, at 13:14, Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote: > > Hi, > > 在 2024/08/15 18:03, Christian Theune 写道: >> Hi, >> small insight: even given my dataset that can reliably trigger this (after around 1.5 hours of rsyncing) it does not trigger on a specific set of files. I’ve deleted the data and started the rsync on a fresh directory (not a fresh filesystem, I can’t delete that as it carries important data) but it doesn’t always get stuck on the same files, even though rsync processes them in a repeatable order. >> I’m wondering how to generate more insights from that. Maybe keeping a blktrace log might help? >> It sounds like the specific pattern relies on XFS doing a specific thing there … >> Wild idea: maybe running the xfstest suite on an in-memory raid 6 setup could reproduce this? >> I’m guessing that the xfs people do not regularly run their test suite on a layered setup like mine with encryption and software raid? > > That sounds greate. Alright. I will try that. >>> @Yu: you mentioned that you might be able to provide me a kernel that produces more error logging to diagnose this? Any chance we could try that route? > > Yes, however, I still need some time to sort out the internal process of > raid5. I'm quite busy with some other work stuff and I'm familiar with > raid1/10, but not too much about raid5. :( > > Main idea is to figure out why IO are not dispatched to underlying > disks. Sure, thanks - I’m happy to be patient. :) Christian -- Christian Theune · ct@xxxxxxxxxxxxxxx · +49 345 219401 0 Flying Circus Internet Operations GmbH · https://flyingcircus.io Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian Zagrodnick