I ran some tests on a 4 intel socket box with files in tmpfs (gold 6152 I think) and with the files interleaved 4way (I think) got the same speeds you got on your intels (roughly) with defaults. I also tested on my 6 core/4500u ryzen and got almost the same speed(slightly slower) as on your large ryzen boxes with many numa nodes, so it has to be effectively only using a single numa node and a single cpu. I did test my 4500u ryzen machine with fewer cores enabled, 1 core got 18M, 2 cores got 23M, and 3 got 32M so it did not appear scale past 3 cores. I also testing on an ancient a8-5600k and was almost the same speed as the ryzen. >From the calls there must be a lot of reading memory. And I got the same speed using shm, using tmpfs, using tmpfs+hugepages and using files on a disk that should have been in file cache.