On 10/26/22 18:08, Jens Axboe wrote:
On 10/26/22 10:00 AM, Stefan Metzmacher wrote:
Hi Jens,
[...]
b) The major show stopper is that IORING_OP_POLL_ADD calls fget(), while
it's pending. Which means that a close() on the related file descriptor
is not able to remove the last reference! This is a problem for points 3.d,
4.a and 4.b from above.
I doubt IORING_ASYNC_CANCEL_FD would be able to be used as there's not always
code being triggered around a raw close() syscall, which could do a sync cancel.
For now I plan to epoll_ctl (or IORING_OP_EPOLL_CTL) and only
register the fd from epoll_create() with IORING_OP_POLL_ADD
or I keep epoll_wait() as blocking call and register the io_uring fd
with epoll.
I looked at the related epoll code and found that it uses
a list in struct file->f_ep to keep the reference, which gets
detached also via eventpoll_release_file() called from __fput()
Would it be possible move IORING_OP_POLL_ADD to use a similar model
so that close() will causes a cqe with -ECANCELED?
I'm currently trying to prototype for an IORING_POLL_CANCEL_ON_CLOSE
flag that can be passed to POLL_ADD. With that we'll register
the request in &req->file->f_uring_poll (similar to the file->f_ep list for epoll)
Then we only get a real reference to the file during the call to
vfs_poll() otherwise we drop the fget/fput reference and rely on
an io_uring_poll_release_file() (similar to eventpoll_release_file())
to cancel our registered poll request.
Yes, this is a bit tricky as we hold the file ref across the operation. I'd
be interested in seeing your approach to this, and also how it would
interact with registered files...
Not sure I mentioned before but shutdown(2) / IORING_OP_SHUTDOWN
usually helps. Is there anything keeping you from doing that?
Do you only poll sockets or pipes as well?
c) A simple pipe based performance test shows the following numbers:
- 'poll': Got 232387.31 pipe events/sec
- 'epoll': Got 251125.25 pipe events/sec
- 'samba_io_uring_ev': Got 210998.77 pipe events/sec
So the io_uring backend is even slower than the 'poll' backend.
I guess the reason is the constant re-submission of IORING_OP_POLL_ADD.
Added some feature autodetection today and I'm now using
IORING_SETUP_COOP_TASKRUN, IORING_SETUP_TASKRUN_FLAG,
IORING_SETUP_SINGLE_ISSUER and IORING_SETUP_DEFER_TASKRUN if supported
by the kernel.
On a 6.1 kernel this improved the performance a lot, it's now faster
than the epoll backend.
The key flag is IORING_SETUP_DEFER_TASKRUN. On a different system than above
I'm getting the following numbers:
- epoll: Got 114450.16 pipe events/sec
- poll: Got 105872.52 pipe events/sec
- samba_io_uring_ev-without-defer_taskrun': Got 95564.22 pipe events/sec
- samba_io_uring_ev-with-defer_taskrun': Got 122853.85 pipe events/sec
Any chance you can do a run with just IORING_SETUP_COOP_TASKRUN set? I'm
curious how big of an impact the IPI elimination is, where it slots in
compared to the defer taskrun and the default settings.
And if it doesn't take too much time to test, it would also be interesting
to see if there is any impact from IORING_SETUP_SINGLE_ISSUER alone,
without TASKRUN flags.
--
Pavel Begunkov