Re: [PATCH] eventfd: support delayed wakeup for non-semaphore eventfd to reduce cpu utilization

Wen Yang <wenyang.linux@xxxxxxxxxxx> · Fri, 21 Apr 2023 01:44:35 +0800

在 2023/4/20 00:42, Jens Axboe 写道:
On 4/19/23 3:12?AM, Christian Brauner wrote:
On Tue, Apr 18, 2023 at 08:15:03PM -0600, Jens Axboe wrote:
On 4/17/23 10:32?AM, Wen Yang wrote:
? 2023/4/17 22:38, Jens Axboe ??:
On 4/16/23 5:31?AM, wenyang.linux@xxxxxxxxxxx wrote:
From: Wen Yang <wenyang.linux@xxxxxxxxxxx>

For the NON SEMAPHORE eventfd, if it's counter has a nonzero value,
then a read(2) returns 8 bytes containing that value, and the counter's
value is reset to zero. Therefore, in the NON SEMAPHORE scenario,
N event_writes vs ONE event_read is possible.

However, the current implementation wakes up the read thread immediately
in eventfd_write so that the cpu utilization increases unnecessarily.

By adding a configurable delay after eventfd_write, these unnecessary
wakeup operations are avoided, thereby reducing cpu utilization.
What's the real world use case of this, and what would the expected
delay be there? With using a delayed work item for this, there's
certainly a pretty wide grey zone in terms of delay where this would
perform considerably worse than not doing any delayed wakeups at all.

Thanks for your comments.

We have found that the CPU usage of the message middleware is high in
our environment, because sensor messages from MCU are very frequent
and constantly reported, possibly several hundred thousand times per
second. As a result, the message receiving thread is frequently
awakened to process short messages.

The following is the simplified test code:
https://github.com/w-simon/tests/blob/master/src/test.c

And the test code in this patch is further simplified.

Finally, only a configuration item has been added here, allowing users
to make more choices.
I think you'd have a higher chance of getting this in if the delay
setting was per eventfd context, rather than a global thing.
That patch seems really weird. Is that an established paradigm to
address problems like this through a configured wakeup delay? Because
naively this looks like a pretty brutal hack.
It is odd, and it is a brutal hack. My worries were outlined in an
earlier reply, there's quite a big gap where no delay would be better
and the delay approach would be miserable because it'd cause extra
latency and extra context switches. It'd be much cleaner if you KNEW
there'd be more events coming, as you could then get rid of that delayed
work item completely. And I suspect, if this patch makes sense, that
it'd be better to have a number+time limit as well and if you hit the
event number count that you'd notify inline and put some smarts in the
delayed work handling to just not do anything if nothing is pending.

Thank you very much for your suggestion.

We will improve the implementation according to your suggestion and send 
the v2 later.

--

Best wishes,

Wen