On 3/3/22 16:31, Jens Axboe wrote:
On 3/3/22 7:40 AM, Jens Axboe wrote:
On 3/3/22 7:36 AM, Jens Axboe wrote:
The only potential oddity here is that the fd passed back is not a
legitimate fd. io_uring does support poll(2) on its file descriptor, so
that could cause some confusion even if I don't think anyone actually
does poll(2) on io_uring.
Side note - the only implication here is that we then likely can't make
the optimized behavior the default, it has to be an IORING_SETUP_REG
flag which tells us that the application is aware of this limitation.
Though I guess close(2) might mess with that too... Hmm.
Not sure I can find a good approach for that. Tried out your patch and
made some fixes:
- Missing free on final tctx free
- Rename registered_files to registered_rings
- Fix off-by-ones in checking max registration count
- Use kcalloc
- Rename ENTER_FIXED_FILE -> ENTER_REGISTERED_RING
- Don't pass in tctx to io_uring_unreg_ringfd()
- Get rid of forward declaration for adding tctx node
- Get rid of extra file pointer in io_uring_enter()
- Fix deadlock in io_ringfd_register()
- Use io_uring_rsrc_update rather than add a new struct type
Patch I ran below.
Ran some testing here, and on my laptop, running:
axboe@m1pro-kvm ~/g/fio (master)> t/io_uring -N1 -s1 -f0
polled=1, fixedbufs=1/0, register_files=1, buffered=0, QD=128
Engine=io_uring, sq_ring=128, cq_ring=128
submitter=0, tid=673
IOPS=6627K, IOS/call=1/1, inflight=()
IOPS=6995K, IOS/call=1/1, inflight=()
IOPS=6992K, IOS/call=1/1, inflight=()
IOPS=7005K, IOS/call=1/1, inflight=()
IOPS=6999K, IOS/call=1/1, inflight=()
and with registered ring
axboe@m1pro-kvm ~/g/fio (master)> t/io_uring -N1 -s1 -f1
polled=1, fixedbufs=1/0, register_files=1, buffered=0, QD=128
Engine=io_uring, sq_ring=128, cq_ring=128
submitter=0, tid=687
ring register 0
IOPS=7714K, IOS/call=1/1, inflight=()
IOPS=8030K, IOS/call=1/1, inflight=()
IOPS=8025K, IOS/call=1/1, inflight=()
IOPS=8015K, IOS/call=1/1, inflight=()
IOPS=8037K, IOS/call=1/1, inflight=()
which is about a 15% improvement, pretty massive...
diff --git a/fs/io_uring.c b/fs/io_uring.c
index ad3e0b0ab3b9..8a1f97054b71 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
[...]
static void *io_uring_validate_mmap_request(struct file *file,
loff_t pgoff, size_t sz)
{
@@ -10191,12 +10266,23 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit,
io_run_task_work();
if (unlikely(flags & ~(IORING_ENTER_GETEVENTS | IORING_ENTER_SQ_WAKEUP |
- IORING_ENTER_SQ_WAIT | IORING_ENTER_EXT_ARG)))
+ IORING_ENTER_SQ_WAIT | IORING_ENTER_EXT_ARG |
+ IORING_ENTER_REGISTERED_RING)))
return -EINVAL;
- f = fdget(fd);
- if (unlikely(!f.file))
- return -EBADF;
+ if (flags & IORING_ENTER_REGISTERED_RING) {
+ struct io_uring_task *tctx = current->io_uring;
+
+ if (fd >= IO_RINGFD_REG_MAX || !tctx)
+ return -EINVAL;
+ f.file = tctx->registered_rings[fd];
btw, array_index_nospec(), possibly not only here.
--
Pavel Begunkov