Hi, and thanks for the feedback. It could be done with @cond indeed, that's how it works for now. However, this addresses performance issues only. The problem with wait_event_*() is that, if we have a counter and are trying to wake up tasks after each increment, it would schedule each waiting task O(threshold) times just for it to spuriously check @cond and go back to sleep. All that overhead (memory barriers, registers save/load, accounting, etc) turned out to be enough for some workloads to slow down the system. With this specialisation it still traverses a wait list and makes indirect calls to the checker callback, but the list supposedly is fairly small, so performance there shouldn't be a problem, at least for now. Regarding semantics; It should wake a task when a value passed to wake_up_threshold() is greater or equal then a task's threshold, that is specified individually for each task in wait_threshold_*(). In pseudo code: ``` def wake_up_threshold(n, wait_queue): for waiter in wait_queue: waiter.wake_up_if(n >= waiter.threshold); ``` Any thoughts how to do it better? Ideas are very welcome. BTW, this monster is mostly a copy-paste from wait_event_*(), wait_bit_*(). We could try to extract some common parts from these three, but that's another topic. On 23/09/2019 11:35, Ingo Molnar wrote: > > * Jens Axboe <axboe@xxxxxxxxx> wrote: > >> On 9/22/19 2:08 AM, Pavel Begunkov (Silence) wrote: >>> From: Pavel Begunkov <asml.silence@xxxxxxxxx> >>> >>> There could be a lot of overhead within generic wait_event_*() used for >>> waiting for large number of completions. The patchset removes much of >>> it by using custom wait event (wait_threshold). >>> >>> Synthetic test showed ~40% performance boost. (see patch 2) >> >> I'm fine with the io_uring side of things, but to queue this up we >> really need Peter or Ingo to sign off on the core wakeup bits... >> >> Peter? > > I'm not sure an extension is needed for such a special interface, why not > just put a ->threshold value next to the ctx->wait field and use either > the regular wait_event() APIs with the proper condition, or > wait_event_cmd() style APIs if you absolutely need something more complex > to happen inside? > > Should result in a much lower linecount and no scheduler changes. :-) > > Thanks, > > Ingo > -- Yours sincerely, Pavel Begunkov
Attachment:
signature.asc
Description: OpenPGP digital signature