On 10/26/22 10:00 AM, Stefan Metzmacher wrote: > Hi Jens, > >> 9. The above works mostly, but manual testing and our massive automated regression tests >> found the following problems: >> >> a) Related to https://github.com/axboe/liburing/issues/684 I was also wondering >> about the return value of io_uring_submit_and_wait_timeout(), >> but in addition I noticed that the timeout parameter doesn't work >> as expected, the function will wait for two times of the timeout value. >> I hacked a fix here: >> https://git.samba.org/?p=metze/samba/wip.git;a=commitdiff;h=06fec644dd9f5748952c8b875878e0e1b0000d33 > > Thanks for doing an upstream fix for the problem. No problem - have you been able to test the current repo in general? I want to cut a 2.3 release shortly, but since that particular change impacts any kind of cqe waiting, would be nice to have a bit more confidence in it. >> b) The major show stopper is that IORING_OP_POLL_ADD calls fget(), while >> it's pending. Which means that a close() on the related file descriptor >> is not able to remove the last reference! This is a problem for points 3.d, >> 4.a and 4.b from above. >> >> I doubt IORING_ASYNC_CANCEL_FD would be able to be used as there's not always >> code being triggered around a raw close() syscall, which could do a sync cancel. >> >> For now I plan to epoll_ctl (or IORING_OP_EPOLL_CTL) and only >> register the fd from epoll_create() with IORING_OP_POLL_ADD >> or I keep epoll_wait() as blocking call and register the io_uring fd >> with epoll. >> >> I looked at the related epoll code and found that it uses >> a list in struct file->f_ep to keep the reference, which gets >> detached also via eventpoll_release_file() called from __fput() >> >> Would it be possible move IORING_OP_POLL_ADD to use a similar model >> so that close() will causes a cqe with -ECANCELED? > > I'm currently trying to prototype for an IORING_POLL_CANCEL_ON_CLOSE > flag that can be passed to POLL_ADD. With that we'll register > the request in &req->file->f_uring_poll (similar to the file->f_ep list for epoll) > Then we only get a real reference to the file during the call to > vfs_poll() otherwise we drop the fget/fput reference and rely on > an io_uring_poll_release_file() (similar to eventpoll_release_file()) > to cancel our registered poll request. Yes, this is a bit tricky as we hold the file ref across the operation. I'd be interested in seeing your approach to this, and also how it would interact with registered files... >> c) A simple pipe based performance test shows the following numbers: >> - 'poll': Got 232387.31 pipe events/sec >> - 'epoll': Got 251125.25 pipe events/sec >> - 'samba_io_uring_ev': Got 210998.77 pipe events/sec >> So the io_uring backend is even slower than the 'poll' backend. >> I guess the reason is the constant re-submission of IORING_OP_POLL_ADD. > > Added some feature autodetection today and I'm now using > IORING_SETUP_COOP_TASKRUN, IORING_SETUP_TASKRUN_FLAG, > IORING_SETUP_SINGLE_ISSUER and IORING_SETUP_DEFER_TASKRUN if supported > by the kernel. > > On a 6.1 kernel this improved the performance a lot, it's now faster > than the epoll backend. > > The key flag is IORING_SETUP_DEFER_TASKRUN. On a different system than above > I'm getting the following numbers: > - epoll: Got 114450.16 pipe events/sec > - poll: Got 105872.52 pipe events/sec > - samba_io_uring_ev-without-defer_taskrun': Got 95564.22 pipe events/sec > - samba_io_uring_ev-with-defer_taskrun': Got 122853.85 pipe events/sec Any chance you can do a run with just IORING_SETUP_COOP_TASKRUN set? I'm curious how big of an impact the IPI elimination is, where it slots in compared to the defer taskrun and the default settings. >> My hope would be that IORING_POLL_ADD_MULTI + IORING_POLL_ADD_LEVEL >> would be able to avoid the performance problem with samba_io_uring_ev >> compared to epoll. > > I've started with a IORING_POLL_ADD_MULTI + IORING_POLL_ADD_LEVEL prototype, > but it's not very far yet and due to the IORING_SETUP_DEFER_TASKRUN > speedup, I'll postpone working on IORING_POLL_ADD_LEVEL. OK -- Jens Axboe