Re: [PATCH v2 0/2] Optimise io_uring completion waiting

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi, and thanks for the feedback.

It could be done with @cond indeed, that's how it works for now.
However, this addresses performance issues only.

The problem with wait_event_*() is that, if we have a counter and are
trying to wake up tasks after each increment, it would schedule each
waiting task O(threshold) times just for it to spuriously check @cond
and go back to sleep. All that overhead (memory barriers, registers
save/load, accounting, etc) turned out to be enough for some workloads
to slow down the system.

With this specialisation it still traverses a wait list and makes
indirect calls to the checker callback, but the list supposedly is
fairly  small, so performance there shouldn't be a problem, at least for
now.

Regarding semantics; It should wake a task when a value passed to
wake_up_threshold() is greater or equal then a task's threshold, that is
specified individually for each task in wait_threshold_*().

In pseudo code:
```
def wake_up_threshold(n, wait_queue):
	for waiter in wait_queue:
		waiter.wake_up_if(n >= waiter.threshold);
```

Any thoughts how to do it better? Ideas are very welcome.

BTW, this monster is mostly a copy-paste from wait_event_*(),
wait_bit_*(). We could try to extract some common parts from these
three, but that's another topic.


On 23/09/2019 11:35, Ingo Molnar wrote:
> 
> * Jens Axboe <axboe@xxxxxxxxx> wrote:
> 
>> On 9/22/19 2:08 AM, Pavel Begunkov (Silence) wrote:
>>> From: Pavel Begunkov <asml.silence@xxxxxxxxx>
>>>
>>> There could be a lot of overhead within generic wait_event_*() used for
>>> waiting for large number of completions. The patchset removes much of
>>> it by using custom wait event (wait_threshold).
>>>
>>> Synthetic test showed ~40% performance boost. (see patch 2)
>>
>> I'm fine with the io_uring side of things, but to queue this up we
>> really need Peter or Ingo to sign off on the core wakeup bits...
>>
>> Peter?
> 
> I'm not sure an extension is needed for such a special interface, why not 
> just put a ->threshold value next to the ctx->wait field and use either 
> the regular wait_event() APIs with the proper condition, or 
> wait_event_cmd() style APIs if you absolutely need something more complex 
> to happen inside?
> 
> Should result in a much lower linecount and no scheduler changes. :-)
> 
> Thanks,
> 
> 	Ingo
> 

-- 
Yours sincerely,
Pavel Begunkov

Attachment: signature.asc
Description: OpenPGP digital signature


[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux