Hi Linus, I've had this done for months and posted a few times, but little attention has been received. Sending it out for inclusion now, as having it caught up in upstream limbo is preventing further use cases of it at Meta. We upstream every feature that we develop, and we don't put any features into our kernel that aren't already upstream, or on the way upstream. This is obviously especially important when an API is involved. This adds an epoll_ctl method for setting the minimum wait time for retrieving events. In production, workloads don't just run mostly idle or mostly full tilt. A common pattern is medium load. epoll_wait and friends receive a cap of max events and max time we want to wait for them, but there's no notion of min events or min time. This leads to services only getting a single event, even if they are totally fine with waiting eg 200 usec for more events. More events leads to greater efficiency in handling them. The main patch has some numbers, but tldr is that we see a nice reduction in context switches / second, and a reduction in busy time on such systems. It has been suggested that a syscall should be available for this as well, and there are two main reasons for why this wasn't pursued (but was still investigated): - This most likely should've been done as epoll_pwait3(), as we already have epoll_wait, epoll_pwait, and epoll_pwait2. The latter two are already at the max number of syscall arguments, so a new method would have to be done where a struct would define the API. With some arguments being optional, this could get inefficient or ugly (or both). - Main reason is that Meta doesn't need it. By using epoll_ctl, the check-for-support-of-feature can be relegated to setup time rather than in the fast path, and the workloads we are looking at would not need different min wait settings within a single epoll context. Please pull! The following changes since commit ef4d3ea40565a781c25847e9cb96c1bd9f462bc6: afs: Fix server->active leak in afs_put_server (2022-11-30 10:02:37 -0800) are available in the Git repository at: git://git.kernel.dk/linux.git tags/epoll-min_ts-2022-12-08 for you to fetch changes up to 73b9320234c0ad1b5e6f576abb796221eb088c64: eventpoll: ensure we pass back -EBADF for a bad file descriptor (2022-12-08 07:05:42 -0700) ---------------------------------------------------------------- epoll-min_ts-2022-12-08 ---------------------------------------------------------------- Jens Axboe (8): eventpoll: cleanup branches around sleeping for events eventpoll: don't pass in 'timed_out' to ep_busy_loop() eventpoll: split out wait handling eventpoll: move expires to epoll_wq eventpoll: move file checking earlier for epoll_ctl() eventpoll: add support for min-wait eventpoll: add method for configuring minimum wait on epoll context eventpoll: ensure we pass back -EBADF for a bad file descriptor fs/eventpoll.c | 192 +++++++++++++++++++++++++++++++++-------- include/linux/eventpoll.h | 2 +- include/uapi/linux/eventpoll.h | 1 + 3 files changed, 158 insertions(+), 37 deletions(-) -- Jens Axboe