Re: [Non-DoD Source] Re: Can't get RAID5/RAID6 NVMe randomread IOPS - AMD ROME what am I missing?????

Gal Ofri <gal.ofri@xxxxxxxxxxx> · Sun, 8 Aug 2021 17:43:31 +0300

On Thu, 5 Aug 2021 21:10:40 +0000
"Finlayson, James M CIV (USA)" <james.m.finlayson4.civ@xxxxxxxx> wrote:

> BLUF upfront with 5.14rc3 kernel that our SA built - md0 a 10+1+1 RAID5 - 5.332 M IOPS 20.3GiB/s, md1 a 10+1+1 RAID5, 5.892M IOPS 22.5GiB/s  - best hero numbers I've ever seen on mdraid  RAID5 IOPS.   I think the kernel patch is good.  Prior was  socket0 1.263M IOPS 4934MiB/s, socket1 1.071M IOSP, 4183MiB/s....   I'm willing to help push this as hard as we can until we hit a bottleneck outside of our control.   
That's great !
Thanks for sharing your results.
I'd appreciate if you could run a sequential-reads workload (128k/256k) so
that we get a better sense of the throughput potential here.

> In my strict numa adherence with mdraid, I see lots of variability between reboots/assembles.    Sometimes md0 wins, sometimes md1 wins, and in my earlier runs md0 and md1 are notionally balanced.   I change nothing but see this variance.   I just cranked up a week long extended run of these 10+1+1s under the 5.14rc3 kernel and right now   md0 is doing 5M IOPS and md1 6.3M 
Given my humble experience with the code in question, I suspect that it is
not really optimized for numa awareness, so I find your findings quite
reasonable. I don't really have a good tip for that.

I'm focusing now on thin-provisioned logical volumes (lvm - it has a much
worse reads bottleneck actually), but we have plans for researching
md/raid5 again soon to improve write workloads.
I'll ping you when I have a patch that might be relevant.

Cheers,
Gal