On 4/15/22 23:03, Jens Axboe wrote:
On 4/15/22 3:05 PM, Pavel Begunkov wrote:
On 4/12/22 17:46, Jens Axboe wrote:
On 4/12/22 10:41 AM, Jens Axboe wrote:
On 4/12/22 10:24 AM, Pavel Begunkov wrote:
If all completed requests in io_do_iopoll() were marked with
REQ_F_CQE_SKIP, we'll not only skip CQE posting but also
io_free_batch_list() leaking memory and resources.
Move @nr_events increment before REQ_F_CQE_SKIP check. We'll potentially
return the value greater than the real one, but iopolling will deal with
it and the userspace will re-iopoll if needed. In anyway, I don't think
there are many use cases for REQ_F_CQE_SKIP + IOPOLL.
Ah good catch - yes probably not much practical concern, as the lack of
ordering for file IO means that CQE_SKIP isn't really useful for that
scenario.
One potential snag is with the change we're now doing
io_cqring_ev_posted_iopoll() even if didn't post an event. Again
probably not a practical concern, but it is theoretically a violation
if an eventfd is used.
Looks this didn't get applied. Are you concerned about eventfd?
Yep, was hoping to get a reply back, so just deferred it for now.
Is there any good reason why the userspace can't tolerate spurious
eventfd events? Because I don't think we should care this case
I always forget the details on that, but we've had cases like this in
the past where some applications assume that if they got N eventfd
events, then are are also N events in the ring. Which granted is a bit
odd, but it does also make some sense. Why would you have more eventfd
events posted than events?
For the same reason why it can get less eventfd events than there are
CQEs, as for me it's only a communication channel but not a
replacement for completion events.
Ok, we don't want to break old applications, but it's a new most
probably not widely used feature, and we can say that the userspace
has to handle spurious eventfd.
So while I don't think it's a huge issue, and particularly because
IOPOLL and eventfd would be a nonsensical combo, it would still be nice
to generally make sure it's the case.
This isn't the only one though, so maybe we just apply this fix and do
a full check down the line. Can't see this one making issues.
--
Pavel Begunkov