Hi, The following are some incremental optimizations on some of the epoll core. Each patch has the details, but together, the series is seen to shave off measurable cycles on a number of systems and workloads. For example, on a 40-core IB, a pipetest as well as parallel epoll_wait() benchmark show around a 20-30% increase in raw operations per second when the box is fully occupied (incremental thread counts), and up to 15% performance improvement with lower counts. Passes ltp epoll related testcases. Please consider for v4.21/5.0. Thanks! Davidlohr Bueso (6): fs/epoll: remove max_nests argument from ep_call_nested() fs/epoll: simplify ep_send_events_proc() ready-list loop fs/epoll: drop ovflist branch prediction fs/epoll: robustify ep->mtx held checks fs/epoll: reduce the scope of wq lock in epoll_wait() fs/epoll: avoid barrier after an epoll_wait(2) timeout fs/eventpoll.c | 206 ++++++++++++++++++++++++++++++--------------------------- 1 file changed, 108 insertions(+), 98 deletions(-) -- 2.16.4