[GIT PULL] Support for the io_uring IO interface

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Linus,

This pull request adds support for a new IO interface, io_uring.
io_uring allows an application to communicate with the kernel through
two rings, the submission queue (SQ) and completion queue (CQ) ring.
This allows for very efficient handling of IOs, see the v5 posting for
some basic numbers:

https://lore.kernel.org/linux-block/20190116175003.17880-1-axboe@xxxxxxxxx/

Outside of just efficiency, the interface is also flexible and
extendable, and allows for future use cases like the upcoming NVMe
key-value store API, networked IO, and so on. It also supports async
buffered IO, something that we've always failed to support in the
kernel.

Outside of basic IO features, it supports async polled IO as well. This
particular feature has already been tested at Facebook months ago for
flash storage boxes, with 25-33% improvements. It makes polled IO
actually useful for real world use cases, where even basic flash sees a
nice win in terms of efficiency, latency, and performance. These boxes
were IOPS bound before, now they are not.

This series adds three new system calls. One for setting up an io_uring
instance (io_uring_setup(2)), one for submitting/completing IO
(io_uring_enter(2)), and one for aux functions like registrating file
sets, buffers, etc (io_uring_register(2)). Through the help of Arnd,
I've coordinated the syscall numbers so merge on that front should be
painless.

Jon did a writeup of the interface a while back, which (except for minor
details that have been tweaked) is still accurate. Find that here:

https://lwn.net/Articles/776703/

Huge thanks to Al Viro for helping getting the reference cycle code
correct, and to Jann Horn for his extensive reviews focused on both
security and bugs in general.

There's a userspace library that provides basic functionality for
applications that don't need or want to care about how to fiddle with
the rings directly. It has helpers to allow applications to easily set
up an io_uring instance, and submit/complete IO through it without
knowing about the intricacies of the rings. It also includes man pages
(thanks to Jeff Moyer), and will continue to grow support helper
functions and features as time progresses. Find it here:

git://git.kernel.dk/liburing

Fio has full support for the raw interface, both in the form of an IO
engine (io_uring), but also with a small test application (t/io_uring)
that can exercise and benchmark the interface.

Note that this branch sits on top of my for-5.1/block branch, since the
multi-page bvec changes caused a few conflicts with the pre-mapped
buffer support. I also moved a few prep patches to that branch today,
which is why it appears recently rebased (moved the 4 bottom patches
from io_uring to for-5.1/block).

Please consider this feature for 5.1, so we can finally have something
that's both fast, efficient, and feature rich for IO instead of the sad
niche case that is aio/libaio.


  git://git.kernel.dk/linux-block.git tags/io_uring-20190301


----------------------------------------------------------------
Christoph Hellwig (1):
      io_uring: add fsync support

Jens Axboe (13):
      Add io_uring IO interface
      io_uring: support for IO polling
      fs: add fget_many() and fput_many()
      io_uring: use fget/fput_many() for file references
      io_uring: batch io_kiocb allocation
      block: implement bio helper to add iter bvec pages to bio
      io_uring: add support for pre-mapped user IO buffers
      net: split out functions related to registering inflight socket files
      io_uring: add file set registration
      io_uring: add submission polling
      io_uring: add io_kiocb ref count
      io_uring: add support for IORING_OP_POLL
      io_uring: allow workqueue item to handle multiple buffered requests

 arch/x86/entry/syscalls/syscall_32.tbl |    3 +
 arch/x86/entry/syscalls/syscall_64.tbl |    3 +
 block/bio.c                            |   62 +-
 fs/Makefile                            |    1 +
 fs/file.c                              |   15 +-
 fs/file_table.c                        |    9 +-
 fs/io_uring.c                          | 2969 ++++++++++++++++++++++++++++++++
 include/linux/file.h                   |    2 +
 include/linux/fs.h                     |   13 +-
 include/linux/sched/user.h             |    2 +-
 include/linux/syscalls.h               |    8 +
 include/net/af_unix.h                  |    1 +
 include/uapi/asm-generic/unistd.h      |    8 +-
 include/uapi/linux/io_uring.h          |  137 ++
 init/Kconfig                           |    9 +
 kernel/sys_ni.c                        |    3 +
 net/Makefile                           |    2 +-
 net/unix/Kconfig                       |    5 +
 net/unix/Makefile                      |    2 +
 net/unix/af_unix.c                     |   63 +-
 net/unix/garbage.c                     |   68 +-
 net/unix/scm.c                         |  151 ++
 net/unix/scm.h                         |   10 +
 23 files changed, 3400 insertions(+), 146 deletions(-)
 create mode 100644 fs/io_uring.c
 create mode 100644 include/uapi/linux/io_uring.h
 create mode 100644 net/unix/scm.c
 create mode 100644 net/unix/scm.h

-- 
Jens Axboe




[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux