[GIT PULL] io_uring updates for 6.10-rc1

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Linus,

Here are the io_uring updates and fixes for the 6.10 kernel merge
window. This pull request contains:

- Greatly improve send zerocopy performance, by enabling coalescing of
  sent buffers. MSG_ZEROCOPY already does this with send(2) and
  sendmsg(2), but the io_uring side did not. In local testing, the
  crossover point for send zerocopy being faster is now around 3000 byte
  packets, and it performs better than the sync syscall variants as
  well. This feature relies on a shared branch with net-next, which was
  pulled into both branches.

- Unification of how async preparation is done across opcodes.
  Previously, opcodes that required extra memory for async retry would
  allocate that as needed, using on-stack state until that was the case.
  If async retry was needed, the on-stack state was adjusted
  appropriately for a retry and then copied to the allocated memory.
  This led to some fragile and ugly code, particularly for read/write
  handling, and made storage retries more difficult than they needed to
  be. Allocate the memory upfront, as it's cheap from our pools, and use
  that state consistently both initially and also from the retry side.

- Move away from using remap_pfn_range() for mapping the rings. This is
  really not the right interface to use and can cause lifetime issues or
  leaks. Additionally, it means the ring sq/cq arrays need to be
  physically contigious, which can cause problems in production with
  larger rings when services are restarted, as memory can be very
  fragmented at that point. Move to using vm_insert_page(s) for the ring
  sq/cq arrays, and apply the same treatment to mapped ring provided
  buffers. This also helps unify the code we have dealing with
  allocating and mapping memory. Hard to see in the diffstat as we're
  adding a few features as well, but this kills about ~400 lines of code
  from the codebase as well.

- Add support for bundles for send/recv. When used with provided
  buffers, bundles support sending or receiving more than one buffer at
  the time, improving the efficiency by only needing to call into the
  networking stack once for multiple sends or receives.

- Tweaks for our accept operations, supporting both a DONTWAIT flag for
  skipping poll arm and retry if we can, and a POLLFIRST flag that the
  application can use to skip the initial accept attempt and rely purely
  on poll for triggering the operation. Both of these have identical
  flags on the receive side already.

- Make the task_work ctx locking unconditional. We had various code
  paths here that would do a mix of lock/trylock and set the task_work
  state to whether or not it was locked. All of that goes away, we lock
  it unconditionally and get rid of the state flag indicating whether
  it's locked or not. The state struct still exists as an empty type,
  can go away in the future.

- Add support for specifying NOP completion values, allowing it to be
  used for error handling testing.

- Use set/test bit for io-wq worker flags. Not strictly needed, but also
  doesn't hurt and helps silence a KCSAN warning.

- Cleanups for io-wq locking and work assignments, closing a tiny race
  where cancelations would not be able to find the work item reliably.

- Misc fixes, cleanups, and improvements.

Please pull!


The following changes since commit 0bbac3facb5d6cc0171c45c9873a2dc96bea9680:

  Linux 6.9-rc4 (2024-04-14 13:38:39 -0700)

are available in the Git repository at:

  git://git.kernel.dk/linux.git tags/for-6.10/io_uring-20240511

for you to fetch changes up to deb1e496a83557896fe0cca0b8af01c2a97c0dc6:

  io_uring: support to inject result for NOP (2024-05-10 06:09:45 -0600)

----------------------------------------------------------------
for-6.10/io_uring-20240511

----------------------------------------------------------------
Breno Leitao (1):
      io_uring/io-wq: Use set_bit() and test_bit() at worker->flags

Gabriel Krisman Bertazi (4):
      io_uring: Avoid anonymous enums in io_uring uapi
      io-wq: write next_work before dropping acct_lock
      io-wq: Drop intermediate step between pending list and active work
      io_uring: Require zeroed sqe->len on provided-buffers send

Jens Axboe (52):
      nvme/io_uring: use helper for polled completions
      io_uring: flush delayed fallback task_work in cancelation
      io_uring: remove timeout/poll specific cancelations
      io_uring/alloc_cache: shrink default max entries from 512 to 128
      io_uring/net: switch io_send() and io_send_zc() to using io_async_msghdr
      io_uring/net: switch io_recv() to using io_async_msghdr
      io_uring/net: unify cleanup handling
      io_uring/net: always setup an io_async_msghdr
      io_uring/net: always set kmsg->msg.msg_control_user before issue
      io_uring/net: get rid of ->prep_async() for receive side
      io_uring/net: get rid of ->prep_async() for send side
      io_uring: kill io_msg_alloc_async_prep()
      io_uring/net: remove (now) dead code in io_netmsg_recycle()
      io_uring/net: add iovec recycling
      io_uring/net: drop 'kmsg' parameter from io_req_msg_cleanup()
      io_uring/rw: always setup io_async_rw for read/write requests
      io_uring: get rid of struct io_rw_state
      io_uring/rw: cleanup retry path
      io_uring/rw: add iovec recycling
      io_uring/net: move connect to always using async data
      io_uring/uring_cmd: switch to always allocating async data
      io_uring/uring_cmd: defer SQE copying until it's needed
      io_uring: drop ->prep_async()
      io_uring/alloc_cache: switch to array based caching
      io_uring/poll: shrink alloc cache size to 32
      io_uring: refill request cache in memory order
      io_uring: re-arrange Makefile order
      io_uring: use the right type for work_llist empty check
      mm: add nommu variant of vm_insert_pages()
      io_uring: get rid of remap_pfn_range() for mapping rings/sqes
      io_uring: use vmap() for ring mapping
      io_uring: unify io_pin_pages()
      io_uring/kbuf: vmap pinned buffer ring
      io_uring/kbuf: use vm_insert_pages() for mmap'ed pbuf ring
      io_uring: use unpin_user_pages() where appropriate
      io_uring: move mapping/allocation helpers to a separate file
      io_uring: fix warnings on shadow variables
      io_uring/kbuf: remove dead define
      io_uring: ensure overflow entries are dropped when ring is exiting
      io_uring/sqpoll: work around a potential audit memory leak
      io_uring/rw: ensure retry condition isn't lost
      io_uring/net: add generic multishot retry helper
      io_uring/net: add provided buffer support for IORING_OP_SEND
      io_uring/kbuf: add helpers for getting/peeking multiple buffers
      io_uring/net: support bundles for send
      io_uring/net: support bundles for recv
      Merge branch 'for-uring-ubufops' of git://git.kernel.org/pub/scm/linux/kernel/git/kuba/linux into for-6.10/io_uring
      io_uring/rw: reinstate thread check for retries
      io_uring/msg_ring: cleanup posting to IOPOLL vs !IOPOLL ring
      io_uring/filetable: don't unnecessarily clear/reset bitmap
      io_uring/net: add IORING_ACCEPT_DONTWAIT flag
      io_uring/net: add IORING_ACCEPT_POLL_FIRST flag

Jiapeng Chong (1):
      io_uring: Remove unused function

Joel Granados (1):
      io_uring: Remove the now superfluous sentinel elements from ctl_table array

Ming Lei (4):
      io_uring: kill dead code in io_req_complete_post
      io_uring: return void from io_put_kbuf_comp()
      io_uring: fail NOP if non-zero op flags is passed in
      io_uring: support to inject result for NOP

Pavel Begunkov (33):
      io_uring/cmd: move io_uring_try_cancel_uring_cmd()
      io_uring/cmd: kill one issue_flags to tw conversion
      io_uring/cmd: fix tw <-> issue_flags conversion
      io_uring/cmd: document some uring_cmd related helpers
      io_uring/rw: avoid punting to io-wq directly
      io_uring: force tw ctx locking
      io_uring: remove struct io_tw_state::locked
      io_uring: refactor io_fill_cqe_req_aux
      io_uring: get rid of intermediate aux cqe caches
      io_uring: remove current check from complete_post
      io_uring: refactor io_req_complete_post()
      io_uring: clean up io_lockdep_assert_cq_locked
      io_uring: turn implicit assumptions into a warning
      io_uring: remove async request cache
      io_uring: remove io_req_put_rsrc_locked()
      io_uring/net: merge ubuf sendzc callbacks
      io_uring/net: get rid of io_notif_complete_tw_ext
      io_uring/net: set MSG_ZEROCOPY for sendzc in advance
      io_uring: separate header for exported net bits
      io_uring: unexport io_req_cqe_overflow()
      io_uring: remove extra SQPOLL overflow flush
      io_uring: open code io_cqring_overflow_flush()
      io_uring: always lock __io_cqring_overflow_flush
      io_uring: consolidate overflow flushing
      io_uring/notif: refactor io_tx_ubuf_complete()
      io_uring/notif: remove ctx var from io_notif_tw_complete
      io_uring/notif: shrink account_pages to u32
      net: extend ubuf_info callback to ops structure
      net: add callback for setting a ubuf_info to skb
      io_uring/notif: simplify io_notif_flush()
      io_uring/notif: implement notification stacking
      io_uring/net: fix sendzc lazy wake polling
      io_uring/notif: disable LAZY_WAKE for linked notifs

Ruyi Zhang (1):
      io_uring/timeout: remove duplicate initialization of the io_timeout list.

linke li (1):
      io_uring/msg_ring: reuse ctx->submitter_task read using READ_ONCE instead of re-reading it

 drivers/net/tap.c                   |   2 +-
 drivers/net/tun.c                   |   2 +-
 drivers/net/xen-netback/common.h    |   5 +-
 drivers/net/xen-netback/interface.c |   2 +-
 drivers/net/xen-netback/netback.c   |  11 +-
 drivers/nvme/host/ioctl.c           |  15 +-
 drivers/vhost/net.c                 |   8 +-
 include/linux/io_uring.h            |   6 -
 include/linux/io_uring/cmd.h        |  24 +
 include/linux/io_uring/net.h        |  18 +
 include/linux/io_uring_types.h      |  19 +-
 include/linux/skbuff.h              |  21 +-
 include/uapi/linux/io_uring.h       |  38 +-
 io_uring/Makefile                   |  15 +-
 io_uring/alloc_cache.h              |  59 ++-
 io_uring/cancel.c                   |   4 +-
 io_uring/fdinfo.c                   |   4 +-
 io_uring/filetable.c                |   4 +-
 io_uring/futex.c                    |  30 +-
 io_uring/futex.h                    |   5 +-
 io_uring/io-wq.c                    |  67 +--
 io_uring/io_uring.c                 | 665 +++++-----------------------
 io_uring/io_uring.h                 |  33 +-
 io_uring/kbuf.c                     | 318 ++++++++------
 io_uring/kbuf.h                     |  64 ++-
 io_uring/memmap.c                   | 336 ++++++++++++++
 io_uring/memmap.h                   |  25 ++
 io_uring/msg_ring.c                 |  12 +-
 io_uring/net.c                      | 852 +++++++++++++++++++++---------------
 io_uring/net.h                      |  29 +-
 io_uring/nop.c                      |  26 +-
 io_uring/notif.c                    | 108 +++--
 io_uring/notif.h                    |  13 +-
 io_uring/opdef.c                    |  65 ++-
 io_uring/opdef.h                    |   9 +-
 io_uring/poll.c                     |  15 +-
 io_uring/poll.h                     |   9 +-
 io_uring/refs.h                     |   7 +
 io_uring/register.c                 |   3 +-
 io_uring/rsrc.c                     |  47 +-
 io_uring/rsrc.h                     |  13 +-
 io_uring/rw.c                       | 585 ++++++++++++-------------
 io_uring/rw.h                       |  25 +-
 io_uring/sqpoll.c                   |   8 +
 io_uring/timeout.c                  |   9 +-
 io_uring/uring_cmd.c                | 122 +++++-
 io_uring/uring_cmd.h                |   8 +-
 io_uring/waitid.c                   |   2 +-
 mm/nommu.c                          |   7 +
 net/core/skbuff.c                   |  36 +-
 net/socket.c                        |   2 +-
 51 files changed, 2050 insertions(+), 1762 deletions(-)
 create mode 100644 include/linux/io_uring/net.h
 create mode 100644 io_uring/memmap.c
 create mode 100644 io_uring/memmap.h

-- 
Jens Axboe





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux