Sorry for the noise!
I didn't use O_DIRECT. Now it looks much better at >5GB/s in our
program. Unbelievable how fast SSDs have become.
Thanks for the cool new interface BTW :)
On 29.12.20 19:48, Jens Axboe wrote:
On 12/29/20 9:19 AM, Keith Busch wrote:
On Tue, Dec 29, 2020 at 02:40:57PM +0100, Stefan Lederer wrote:
Hello dear list,
(I hope I do not annoy you as a simple application programmer)
for a seminar paper at my university we reproduced the 2009 paper
"Pathologies of big data" by Jacobs, where he basically reads a
100GB file sequentially from a HDD with some light processing.
We have a PCIE4.0 SSD with up to 7GB/s reading (Samsung 980) but
nothing we programmed so far comes even close to that speed (regular
read(), mmap() with optional threads, io_uring, multi-process) so we
wonder if it is possible at all?
According to iostat mmap is the fastest with 4GB/s and a queue depth
of ~3. All other approaches do not go beyond 2.5GB/s.
Also we get some strange effects like sequential read() with 16KB
buffers being faster than one with 16MB and io_uring being alot
slower than mmap (all tested on Manjaro with kernel 5.8/5.10 and ext4).
So, now we are quite lost and would appreciate a hint into the right
direction :)
What is neccesary to simply read 100GB of data at 7GB/s?
Is your device running at gen4 speed? Easiest way to tell with an nvme
ssd (assuming you're reading from /dev/nvme0n1) is something like:
# cat /sys/block/nvme0n1/device/device/current_link_speed
If it says less than 16GT/s, then it can't read at 7GB/s.
Does sound like that a lot. Simple test here on a gen4 device:
# cat /sys/block/nvme3n1/device/device/current_link_speed
16.0 GT/s PCIe
# ~axboe/git/fio/fio --name=bw --filename=/dev/nvme3n1 --direct=1 --bs=32k --ioengine=io_uring --iodepth=16 --rw=randread --norandommap
[snip]
READ: bw=6630MiB/s (6952MB/s), 6630MiB/s-6630MiB/s (6952MB/s-6952MB/s), io=36.4GiB (39.1GB), run=5621-5621msec