On Thu, 5 Aug 2021 21:10:40 +0000 "Finlayson, James M CIV (USA)" <james.m.finlayson4.civ@xxxxxxxx> wrote: > BLUF upfront with 5.14rc3 kernel that our SA built - md0 a 10+1+1 RAID5 - 5.332 M IOPS 20.3GiB/s, md1 a 10+1+1 RAID5, 5.892M IOPS 22.5GiB/s - best hero numbers I've ever seen on mdraid RAID5 IOPS. I think the kernel patch is good. Prior was socket0 1.263M IOPS 4934MiB/s, socket1 1.071M IOSP, 4183MiB/s.... I'm willing to help push this as hard as we can until we hit a bottleneck outside of our control. That's great ! Thanks for sharing your results. I'd appreciate if you could run a sequential-reads workload (128k/256k) so that we get a better sense of the throughput potential here. > In my strict numa adherence with mdraid, I see lots of variability between reboots/assembles. Sometimes md0 wins, sometimes md1 wins, and in my earlier runs md0 and md1 are notionally balanced. I change nothing but see this variance. I just cranked up a week long extended run of these 10+1+1s under the 5.14rc3 kernel and right now md0 is doing 5M IOPS and md1 6.3M Given my humble experience with the code in question, I suspect that it is not really optimized for numa awareness, so I find your findings quite reasonable. I don't really have a good tip for that. I'm focusing now on thin-provisioned logical volumes (lvm - it has a much worse reads bottleneck actually), but we have plans for researching md/raid5 again soon to improve write workloads. I'll ping you when I have a patch that might be relevant. Cheers, Gal