On 3/3/22 6:52 PM, Pavel Begunkov wrote: > On 3/3/22 16:31, Jens Axboe wrote: >> On 3/3/22 7:40 AM, Jens Axboe wrote: >>> On 3/3/22 7:36 AM, Jens Axboe wrote: >>>> The only potential oddity here is that the fd passed back is not a >>>> legitimate fd. io_uring does support poll(2) on its file descriptor, so >>>> that could cause some confusion even if I don't think anyone actually >>>> does poll(2) on io_uring. >>> >>> Side note - the only implication here is that we then likely can't make >>> the optimized behavior the default, it has to be an IORING_SETUP_REG >>> flag which tells us that the application is aware of this limitation. >>> Though I guess close(2) might mess with that too... Hmm. >> >> Not sure I can find a good approach for that. Tried out your patch and >> made some fixes: >> >> - Missing free on final tctx free >> - Rename registered_files to registered_rings >> - Fix off-by-ones in checking max registration count >> - Use kcalloc >> - Rename ENTER_FIXED_FILE -> ENTER_REGISTERED_RING >> - Don't pass in tctx to io_uring_unreg_ringfd() >> - Get rid of forward declaration for adding tctx node >> - Get rid of extra file pointer in io_uring_enter() >> - Fix deadlock in io_ringfd_register() >> - Use io_uring_rsrc_update rather than add a new struct type >> >> Patch I ran below. >> >> Ran some testing here, and on my laptop, running: >> >> axboe@m1pro-kvm ~/g/fio (master)> t/io_uring -N1 -s1 -f0 >> polled=1, fixedbufs=1/0, register_files=1, buffered=0, QD=128 >> Engine=io_uring, sq_ring=128, cq_ring=128 >> submitter=0, tid=673 >> IOPS=6627K, IOS/call=1/1, inflight=() >> IOPS=6995K, IOS/call=1/1, inflight=() >> IOPS=6992K, IOS/call=1/1, inflight=() >> IOPS=7005K, IOS/call=1/1, inflight=() >> IOPS=6999K, IOS/call=1/1, inflight=() >> >> and with registered ring >> >> axboe@m1pro-kvm ~/g/fio (master)> t/io_uring -N1 -s1 -f1 >> polled=1, fixedbufs=1/0, register_files=1, buffered=0, QD=128 >> Engine=io_uring, sq_ring=128, cq_ring=128 >> submitter=0, tid=687 >> ring register 0 >> IOPS=7714K, IOS/call=1/1, inflight=() >> IOPS=8030K, IOS/call=1/1, inflight=() >> IOPS=8025K, IOS/call=1/1, inflight=() >> IOPS=8015K, IOS/call=1/1, inflight=() >> IOPS=8037K, IOS/call=1/1, inflight=() >> >> which is about a 15% improvement, pretty massive... >> >> diff --git a/fs/io_uring.c b/fs/io_uring.c >> index ad3e0b0ab3b9..8a1f97054b71 100644 >> --- a/fs/io_uring.c >> +++ b/fs/io_uring.c > [...] >> static void *io_uring_validate_mmap_request(struct file *file, >> loff_t pgoff, size_t sz) >> { >> @@ -10191,12 +10266,23 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit, >> io_run_task_work(); >> if (unlikely(flags & ~(IORING_ENTER_GETEVENTS | IORING_ENTER_SQ_WAKEUP | >> - IORING_ENTER_SQ_WAIT | IORING_ENTER_EXT_ARG))) >> + IORING_ENTER_SQ_WAIT | IORING_ENTER_EXT_ARG | >> + IORING_ENTER_REGISTERED_RING))) >> return -EINVAL; >> - f = fdget(fd); >> - if (unlikely(!f.file)) >> - return -EBADF; >> + if (flags & IORING_ENTER_REGISTERED_RING) { >> + struct io_uring_task *tctx = current->io_uring; >> + >> + if (fd >= IO_RINGFD_REG_MAX || !tctx) >> + return -EINVAL; >> + f.file = tctx->registered_rings[fd]; > > btw, array_index_nospec(), possibly not only here. Yeah, was thinking that earlier too in fact but forgot about it. Might as well, though I don't think it's strictly required as it isn't a user table. -- Jens Axboe