For the grand introduction to this feature, see my original posting here: https://lore.kernel.org/linux-block/20181117235317.7366-1-axboe@xxxxxxxxx/ and refer to the previous postings of this patchset for whatever features were added there. Outside of "just" supporting polled IO, it also adds support for user mapped IOCBs, so we don't have to copy those for every IO. A new addition in this version is support for pre-mapped buffers as well. If an application uses fixed IO buffers, it can set IOCTX_FLAG_FIXEDBUFS along with IOCTX_FLAG_USERIOCB. The iocbs that are mapped in should have the maximum length already set, and the buffer field pointing to the right location. That eliminates the need to do get_user_pages() for every IO. Everything is solid for me in all the testing I have done, no problems observed, crashes, corruptions, etc. For the testing below, 'Mainline' refers to current -git from Linus, 'aio-poll' is the aio-poll branch but with none of the new features enabled, and finally 'aio-poll-all' is the aio-poll branch with useriocbs turned on, polling turned on, and user mapped buffers turned on. In other words, mainline and aio-poll are running the exact same workload, and aio-poll-all is running that workload, but with the new features turned on. All testing done with fio. Latencies quoted are 50th percentile. All testing is done with a single thread, using a maximum of one core in the system. Testing is run on two devices - one that supports high peak IOPS, and one that is low latency. Peak IOPS testing on an NVMe device that supports high IOPS: Depth Mainline aio-poll aio-poll-all ============================================================ 1 77K 80K 132K 2 145K 163K 262K 4 287K 342K 514K 8 560K 582K 824K 16 616K 727K 1013K 32 636K 773K 1155K 64 635K 776K 1230K Low latency testing on low latency device: Depth Mainline aio-poll aio-poll-all ============================================================ 1 84K / 8.5 usec 87K / 8.3 usec 168K / 5.0 usec 2 201K / 7.4 usec 208K / 7.1 usec 330K / 5.0 usec 4 389K / 7.7 usec 398K / 7.2 usec 547K / 6.1 usec It's worth nothing that the average IO submission time for 'aio-poll-all' is 660 nsec, with aio-poll 1.8 - 2.0 usec, and finally mainline at 1.8 - 2.1 usec. As before, patches are against my 'mq-perf' branch, and can also be found in my aio-poll branch. Documentation/filesystems/vfs.txt | 3 + arch/x86/entry/syscalls/syscall_64.tbl | 1 + block/bio.c | 36 +- fs/aio.c | 1055 +++++++++++++++++++++--- fs/block_dev.c | 36 +- fs/file.c | 15 +- fs/file_table.c | 10 +- fs/gfs2/file.c | 2 + fs/iomap.c | 56 +- fs/xfs/xfs_file.c | 1 + include/linux/bio.h | 1 + include/linux/blk_types.h | 1 + include/linux/file.h | 2 + include/linux/fs.h | 5 +- include/linux/iomap.h | 1 + include/linux/syscalls.h | 2 + include/linux/uio.h | 3 + include/uapi/asm-generic/unistd.h | 4 +- include/uapi/linux/aio_abi.h | 6 + kernel/sys_ni.c | 1 + lib/iov_iter.c | 35 +- 21 files changed, 1109 insertions(+), 167 deletions(-) -- Jens Axboe