Re: Problems replacing epoll with io_uring in tevent

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Jens,

9. The above works mostly, but manual testing and our massive automated regression tests
    found the following problems:

    a) Related to https://github.com/axboe/liburing/issues/684 I was also wondering
       about the return value of io_uring_submit_and_wait_timeout(),
       but in addition I noticed that the timeout parameter doesn't work
       as expected, the function will wait for two times of the timeout value.
       I hacked a fix here:
       https://git.samba.org/?p=metze/samba/wip.git;a=commitdiff;h=06fec644dd9f5748952c8b875878e0e1b0000d33

Thanks for doing an upstream fix for the problem.

    b) The major show stopper is that IORING_OP_POLL_ADD calls fget(), while
       it's pending. Which means that a close() on the related file descriptor
       is not able to remove the last reference! This is a problem for points 3.d,
       4.a and 4.b from above.

       I doubt IORING_ASYNC_CANCEL_FD would be able to be used as there's not always
       code being triggered around a raw close() syscall, which could do a sync cancel.

       For now I plan to epoll_ctl (or IORING_OP_EPOLL_CTL) and only
       register the fd from epoll_create() with IORING_OP_POLL_ADD
       or I keep epoll_wait() as blocking call and register the io_uring fd
       with epoll.

       I looked at the related epoll code and found that it uses
       a list in struct file->f_ep to keep the reference, which gets
       detached also via eventpoll_release_file() called from __fput()

       Would it be possible move IORING_OP_POLL_ADD to use a similar model
       so that close() will causes a cqe with -ECANCELED?

I'm currently trying to prototype for an IORING_POLL_CANCEL_ON_CLOSE
flag that can be passed to POLL_ADD. With that we'll register
the request in &req->file->f_uring_poll (similar to the file->f_ep list for epoll)
Then we only get a real reference to the file during the call to
vfs_poll() otherwise we drop the fget/fput reference and rely on
an io_uring_poll_release_file() (similar to eventpoll_release_file())
to cancel our registered poll request.

    c) A simple pipe based performance test shows the following numbers:
       - 'poll':               Got 232387.31 pipe events/sec
       - 'epoll':              Got 251125.25 pipe events/sec
       - 'samba_io_uring_ev':  Got 210998.77 pipe events/sec
       So the io_uring backend is even slower than the 'poll' backend.
       I guess the reason is the constant re-submission of IORING_OP_POLL_ADD.

Added some feature autodetection today and I'm now using
IORING_SETUP_COOP_TASKRUN, IORING_SETUP_TASKRUN_FLAG,
IORING_SETUP_SINGLE_ISSUER and IORING_SETUP_DEFER_TASKRUN if supported
by the kernel.

On a 6.1 kernel this improved the performance a lot, it's now faster
than the epoll backend.

The key flag is IORING_SETUP_DEFER_TASKRUN. On a different system than above
I'm getting the following numbers:
- epoll:                                    Got 114450.16 pipe events/sec
- poll:                                     Got 105872.52 pipe events/sec
- samba_io_uring_ev-without-defer_taskrun': Got  95564.22 pipe events/sec
- samba_io_uring_ev-with-defer_taskrun':    Got 122853.85 pipe events/sec

       My hope would be that IORING_POLL_ADD_MULTI + IORING_POLL_ADD_LEVEL
       would be able to avoid the performance problem with samba_io_uring_ev
       compared to epoll.

I've started with a IORING_POLL_ADD_MULTI + IORING_POLL_ADD_LEVEL prototype,
but it's not very far yet and due to the IORING_SETUP_DEFER_TASKRUN
speedup, I'll postpone working on IORING_POLL_ADD_LEVEL.

metze




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux