On 9/24/19 3:20 AM, Pavel Begunkov wrote: > > > On 24/09/2019 11:27, Jens Axboe wrote: >> On 9/24/19 2:02 AM, Jens Axboe wrote: >>> On 9/24/19 1:06 AM, Pavel Begunkov wrote: >>>> On 24/09/2019 02:00, Jens Axboe wrote: >>>>>> I think we can do the same thing, just wrapping the waitqueue in a >>>>>> structure with a count in it, on the stack. Got some flight time >>>>>> coming up later today, let me try and cook up a patch. >>>>> >>>>> Totally untested, and sent out 5 min before departure... But something >>>>> like this. >>>> Hmm, reminds me my first version. Basically that's the same thing but >>>> with macroses inlined. I wanted to make it reusable and self-contained, >>>> though. >>>> >>>> If you don't think it could be useful in other places, sure, we could do >>>> something like that. Is that so? >>> >>> I totally agree it could be useful in other places. Maybe formalized and >>> used with wake_up_nr() instead of adding a new primitive? Haven't looked >>> into that, I may be talking nonsense. >>> >>> In any case, I did get a chance to test it and it works for me. Here's >>> the "finished" version, slightly cleaned up and with a comment added >>> for good measure. >> >> Notes: >> >> This version gets the ordering right, you need exclusive waits to get >> fifo ordering on the waitqueue. >> >> Both versions (yours and mine) suffer from the problem of potentially >> waking too many. I don't think this is a real issue, as generally we >> don't do threaded access to the io_urings. But if you had the following >> tasks wait on the cqring: >> >> [min_events = 32], [min_events = 8], [min_events = 8] >> >> and we reach the io_cqring_events() == threshold, we'll wake all three. >> I don't see a good solution to this, so I suspect we just live with >> until proven an issue. Both versions are much better than what we have >> now. >> > If io_cqring_events() == 8, only the last two would be woken up in both > implementations, as to_wait/threshold is specified per waiter. Isn't it? If io_cqring_events() == 8, then none would be woken in my implementation since the first one will break the wakeup loop. > Agree with waiting, I don't see a good real-life case for that, that > couldn't be done efficiently in userspace. Exactly -- Jens Axboe