We're getting closer... Some good bug fixes in this release, fully stable for me. I decided to make the SQ/CQ ring interface the primary innovation here, and build the new features on top of that. This means that the io_submit/io_getevents interface doesn't get any of the advanced features, outside of polled IO. If you want any of the new fancy, you really sould use the ring interface, it's more efficient. As before, there's a sample application as an introduction to how to use the ring interface: http://git.kernel.dk/cgit/fio/plain/t/aio-ring.c Outside of that, I also built a fio engine for it yesterday. If you update your fio to current -git, you'll have an aio-ring engine. As an example, if you want to run buffered async IO, you would do: [aio-buffered] ioengine=aio-ring sqwq=1 filename=/dev/nvme1n1 direct=0 ramp_time=2 runtime=10s iodepth=64 rw=randread norandommap bs=4k which enables the SQ/CQ workqueue offload. As before, cached data is served without punting to a workqueue. aio-ring also supports the sqthread interface, and it supports fixed buffers, and (of course) polled IO. For a speed run, you'd do: [aio-polled] ioengine=aio-ring hipri=1 fixedbufs=1 filename=/dev/nvme1n1 direct=1 ramp_time=2 runtime=10s iodepth=64 iodepth_batch_complete_max=64 iodepth_batch=32 rw=randread norandommap bs=4k cpus_allowed=0 which enables polling (hipri=1) and fixed buffers (fixedbufs=1). Interface wise, I made an important change to how the SQ ring works. Previously the ring was just an index into an array of iocbs. This was somewhat inflexible, as I found out when I converted applications to use it, as it requires an ordered relationship between application IO units and iocbs. Now the ring is just indexes into the iocb array, so you can submit multiple iocbs in one go without having to rely on iocbs being ordered in the array. As before, find this in my aio-poll branch: http://git.kernel.dk/cgit/linux-block/log/?h=aio-poll or clone it here: git://git.kernel.dk/linux-block aio-poll and the code is based on the pending 4.21 block changes, for-4.21/block. Since v7: - Fix fput error handling in aio_prep_rw() - Dropped USERIOCB for non-ring interface - Fix CQ ring sizing - Change sq_ring to be ring of indexes, not iocbs - Rename aio_iocb_ring -> aio_sq_ring, aio_io_event_ring -> aio_cq_ring - Fix not passing submission state always, if setup - Fix double mmget/use_mm for SQTHREAD for rare conditions - Fix __kthread_bind_mask() warning for SQTHREAD exit - Comments - Various bug fixes Documentation/filesystems/vfs.txt | 3 + Documentation/sysctl/fs.txt | 8 +- arch/x86/entry/syscalls/syscall_64.tbl | 2 + block/bio.c | 33 +- fs/aio.c | 1891 ++++++++++++++++++++++-- fs/block_dev.c | 34 +- fs/file.c | 15 +- fs/file_table.c | 10 +- fs/gfs2/file.c | 2 + fs/iomap.c | 57 +- fs/xfs/xfs_file.c | 1 + include/linux/bio.h | 1 + include/linux/blk_types.h | 2 + include/linux/file.h | 2 + include/linux/fs.h | 5 +- include/linux/iomap.h | 1 + include/linux/syscalls.h | 4 + include/uapi/asm-generic/unistd.h | 4 +- include/uapi/linux/aio_abi.h | 37 + kernel/sys_ni.c | 2 + 20 files changed, 1916 insertions(+), 198 deletions(-) -- Jens Axboe