Re: [PATCH 8/8] io_uring: skip redundant poll wakeups

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 1/28/25 13:39, Max Kellermann wrote:
Using io_uring with epoll is very expensive because every completion
leads to a __wake_up() call, most of which are unnecessary because the
polling process has already been woken up but has not had a chance to
process the completions.  During this time, wq_has_sleeper() still
returns true, therefore this check is not enough.

The poller is not required to call vfs_poll / io_uring_poll()
multiple times, in which case all subsequent events will be dropped
on the floor. E.g. the poller queues a poll entry in the first
io_uring_poll(), and then every time there is an event it'd do
vfs_read() or whatever without removing the entry.

I don't think we can make such assumptions about the poller, at
least without some help from it / special casing particular
callbacks.

...
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 137c2066c5a3..b65efd07e9f0 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
...
@@ -2793,6 +2794,9 @@ static __poll_t io_uring_poll(struct file *file, poll_table *wait)
if (unlikely(!ctx->poll_activated))
  		io_activate_pollwq(ctx);
+
+	atomic_set(&ctx->poll_wq_waiting, 1);

io_uring_poll()                 |
    poll_wq_waiting = 1         |
                                | io_poll_wq_wake()
                                |     poll_wq_waiting = 0
                                |     // no waiters yet => no wake ups
                                | <return to user space>
                                |    <consume all cqes>
    poll_wait()                 |
    return; // no events        |
                                | produce_cqes()
                                | io_poll_wq_wake()
                                |     if (poll_wq_waiting) wake();
                                |     // it's still 0, wake up is lost


  	/*
  	 * provides mb() which pairs with barrier from wq_has_sleeper
  	 * call in io_commit_cqring
diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
index f65e3f3ede51..186cee066f9f 100644
--- a/io_uring/io_uring.h
+++ b/io_uring/io_uring.h
@@ -287,7 +287,7 @@ static inline void io_commit_cqring(struct io_ring_ctx *ctx)
static inline void io_poll_wq_wake(struct io_ring_ctx *ctx)
  {
-	if (wq_has_sleeper(&ctx->poll_wq))
+	if (wq_has_sleeper(&ctx->poll_wq) && atomic_xchg_release(&ctx->poll_wq_waiting, 0) > 0)
  		__wake_up(&ctx->poll_wq, TASK_NORMAL, 0,
  				poll_to_key(EPOLL_URING_WAKE | EPOLLIN));
  }

--
Pavel Begunkov





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux