On 4/15/22 4:53 PM, Jens Axboe wrote: > On 4/15/22 4:41 PM, Pavel Begunkov wrote: >> On 4/15/22 23:03, Jens Axboe wrote: >>> On 4/15/22 3:05 PM, Pavel Begunkov wrote: >>>> On 4/12/22 17:46, Jens Axboe wrote: >>>>> On 4/12/22 10:41 AM, Jens Axboe wrote: >>>>>> On 4/12/22 10:24 AM, Pavel Begunkov wrote: >>>>>>> If all completed requests in io_do_iopoll() were marked with >>>>>>> REQ_F_CQE_SKIP, we'll not only skip CQE posting but also >>>>>>> io_free_batch_list() leaking memory and resources. >>>>>>> >>>>>>> Move @nr_events increment before REQ_F_CQE_SKIP check. We'll potentially >>>>>>> return the value greater than the real one, but iopolling will deal with >>>>>>> it and the userspace will re-iopoll if needed. In anyway, I don't think >>>>>>> there are many use cases for REQ_F_CQE_SKIP + IOPOLL. >>>>>> >>>>>> Ah good catch - yes probably not much practical concern, as the lack of >>>>>> ordering for file IO means that CQE_SKIP isn't really useful for that >>>>>> scenario. >>>>> >>>>> One potential snag is with the change we're now doing >>>>> io_cqring_ev_posted_iopoll() even if didn't post an event. Again >>>>> probably not a practical concern, but it is theoretically a violation >>>>> if an eventfd is used. >>>> Looks this didn't get applied. Are you concerned about eventfd? >>> >>> Yep, was hoping to get a reply back, so just deferred it for now. >>> >>>> Is there any good reason why the userspace can't tolerate spurious >>>> eventfd events? Because I don't think we should care this case >>> >>> I always forget the details on that, but we've had cases like this in >>> the past where some applications assume that if they got N eventfd >>> events, then are are also N events in the ring. Which granted is a bit >>> odd, but it does also make some sense. Why would you have more eventfd >>> events posted than events? >> >> For the same reason why it can get less eventfd events than there are >> CQEs, as for me it's only a communication channel but not a >> replacement for completion events. > > That part is inherently racy in that we might get some CQEs while we > respond to the initial eventfd notifications. But I'm totally agreeing > with you, and it doesn't seem like a big deal to me. > >> Ok, we don't want to break old applications, but it's a new most >> probably not widely used feature, and we can say that the userspace >> has to handle spurious eventfd. > > If I were to guess, I'd say it's probably epoll + eventfd conversions. > But it should just be made explicit. Since events reaped and checked > happen differently anyway, it seems like a bad assumption to make that > eventfd notifications == events available. The patch is against the 5.19 branch, but it might be a better idea to do this for 5.18 as the 5.17 backport will then not need assistance. Can you send it against io_uring-5.18? -- Jens Axboe