RE: selective block polling and preadv2/pwritev2 revisited V2

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> 
> This series allows to selectively enable/disable polling for completions in the
> block layer on a per-I/O basis.  For this it resurrects the
> preadv2/pwritev2 syscalls that Milosz prepared a while ago (and which are
> much simpler now due to VFS changes that happened in the meantime).
> That approach also had a man page update prepared, which I will resubmit
> with the current flags once this series makes it in.
> 
> Polling for block I/O is important to reduce the latency on flash and post-flash
> storage technologies.  On the fastest NVMe controller I have access to it
> almost halves latencies from over 7 microseconds to about 4 microseonds.
> But it only is usesful if we actually care for the latency of this particular I/O,
> and generally is a waste if enabled for all I/O to a given device.  This series
> uses the per-I/O flags in preadv2/pwritev2 to control this behavior.  The
> alternative would be a new O_* flag set at open time or using fcntl, but this is
> still to corse-grained for some applications and we're starting to run out out
> of open flags.

Thanks Christoph for re-submitting this. I for one am very supportive of being able to set priority (and other) flags on a per IO basis. I did some testing of this on a NVMe SSD that uses DRAM rather than NAND as its backing store. My performance absolutes are a bit worse than yours but the improvement of HIPRI over a normal IO was about the same with one thread (3-4us) and was a little bit more (6-7us) at a higher thread count. 

I used a fork of fio with a (rather ugly) hack to enable the new syscalls [1]. I then tested this on a per-thread basis using the following simple shell script. 

#!/bin/bash

FIO=/home/mtr/batesste/fio-git/fio
FILE=/dev/nvme0n1
THREADS=5
TIME=30

$FIO --name lowpri --filename=$FILE --size=1G --direct=1 \
    --ioengine=pvsync --rw=randread --bs=4k --runtime=$TIME \
    --numjobs=$THREADS --iodepth=1 --randrepeat=0 \
    --refill_buffers \
    --name hipri --filename=$FILE --size=1G --direct=1 \
    --ioengine=pv2sync --rw=randread --bs=4k --runtime=$TIME \
    --numjobs=$THREADS --iodepth=1 --randrepeat=0 \
    --refill_buffers

I also reviewed the code and it all looks good to me!

For the series:

Reviewed-by: Stephen Bates <stephen.bates@xxxxxxxx>
Tested-by: Stephen Bates <stephen.bates@xxxxxxxx> 

> Note that there are plenty of other use cases for preadv2/pwritev2 as well,
> but I'd like to concentrate on this one for now.  Example are: non-blocking
> reads (the original purpose), per-I/O O_SYNC, user space support for T10
> DIF/DIX applications tags and probably some more.
> 

 Totally agree!

[1] https://github.com/sbates130272/fio/tree/hipri
--
To unsubscribe from this list: send the line "unsubscribe linux-api" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux