Re: [PATCH 2/2] epoll: atomically remove wait entry on wake up

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 4/30/20 9:03 AM, Roman Penyaev wrote:
> This patch does two things:
> 
> 1. fixes lost wakeup introduced by:
>   339ddb53d373 ("fs/epoll: remove unnecessary wakeups of nested epoll")
> 
> 2. improves performance for events delivery.
> 
> The description of the problem is the following: if N (>1) threads
> are waiting on ep->wq for new events and M (>1) events come, it is
> quite likely that >1 wakeups hit the same wait queue entry, because
> there is quite a big window between __add_wait_queue_exclusive() and
> the following __remove_wait_queue() calls in ep_poll() function.  This
> can lead to lost wakeups, because thread, which was woken up, can
> handle not all the events in ->rdllist. (in better words the problem
> is described here: https://lkml.org/lkml/2019/10/7/905)
> 
> The idea of the current patch is to use init_wait() instead of
> init_waitqueue_entry(). Internally init_wait() sets
> autoremove_wake_function as a callback, which removes the wait entry
> atomically (under the wq locks) from the list, thus the next coming
> wakeup hits the next wait entry in the wait queue, thus preventing
> lost wakeups.
> 
> Problem is very well reproduced by the epoll60 test case [1].
> 
> Wait entry removal on wakeup has also performance benefits, because
> there is no need to take a ep->lock and remove wait entry from the
> queue after the successful wakeup. Here is the timing output of
> the epoll60 test case:
> 
>   With explicit wakeup from ep_scan_ready_list() (the state of the
>   code prior 339ddb53d373):
> 
>     real    0m6.970s
>     user    0m49.786s
>     sys     0m0.113s
> 
>  After this patch:
> 
>    real    0m5.220s
>    user    0m36.879s
>    sys     0m0.019s
> 
> The other testcase is the stress-epoll [2], where one thread consumes
> all the events and other threads produce many events:
> 
>   With explicit wakeup from ep_scan_ready_list() (the state of the
>   code prior 339ddb53d373):
> 
>     threads  events/ms  run-time ms
>           8       5427         1474
>          16       6163         2596
>          32       6824         4689
>          64       7060         9064
>         128       6991        18309
> 
>  After this patch:
> 
>     threads  events/ms  run-time ms
>           8       5598         1429
>          16       7073         2262
>          32       7502         4265
>          64       7640         8376
>         128       7634        16767
> 
>  (number of "events/ms" represents event bandwidth, thus higher is
>   better; number of "run-time ms" represents overall time spent
>   doing the benchmark, thus lower is better)
> 
> [1] tools/testing/selftests/filesystems/epoll/epoll_wakeup_test.c
> [2] https://github.com/rouming/test-tools/blob/master/stress-epoll.c
> 
> Signed-off-by: Roman Penyaev <rpenyaev@xxxxxxx>
> Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
> Cc: Khazhismel Kumykov <khazhy@xxxxxxxxxx>
> Cc: Alexander Viro <viro@xxxxxxxxxxxxxxxxxx>
> Cc: Heiher <r@xxxxxx>
> Cc: Jason Baron <jbaron@xxxxxxxxxx>
> Cc: stable@xxxxxxxxxxxxxxx
> Cc: linux-fsdevel@xxxxxxxxxxxxxxx
> Cc: linux-kernel@xxxxxxxxxxxxxxx
> ---
>  fs/eventpoll.c | 43 ++++++++++++++++++++++++-------------------
>  1 file changed, 24 insertions(+), 19 deletions(-)
> 

Looks good to me and nice speedups.
Reviewed-by: Jason Baron <jbaron@xxxxxxxxxx>

Thanks,

-Jason





[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux