From: Pavel Begunkov <asml.silence@xxxxxxxxx> While waiting for completion events in io_cqring_wait(), the process will be waken up inside wait_threshold_interruptible() on any request completion, check num of events in completion queue and potentially go to sleep again. Apparently, there could be a lot of such spurious wakeups with lots of overhead. It especially manifests itself, when min_events is large, and completions are arriving one by one or in small batches (that usually is true). E.g. if device completes requests one by one and io_uring_enter is waiting for 100 events, then there will be ~99 spurious wakeups. Use new wait_threshold_*() instead, which won't wake it up until necessary number of events is collected. Performance test: The first thread generates requests (QD=512) one by one, so they will be completed in the similar pattern. The second thread waiting for 128 events to complete. Tested with null_blk with 5us delay and 3.8GHz Intel CPU. throughput before: 270 KIOPS throughput after: 370 KIOPS So, ~40% throughput boost on this exaggerate test. Signed-off-by: Pavel Begunkov <asml.silence@xxxxxxxxx> --- fs/io_uring.c | 21 ++++++++++++--------- 1 file changed, 12 insertions(+), 9 deletions(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index 37395208a729..17d2d30b763a 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -70,6 +70,7 @@ #include <linux/nospec.h> #include <linux/sizes.h> #include <linux/hugetlb.h> +#include <linux/wait_threshold.h> #include <uapi/linux/io_uring.h> @@ -403,6 +404,13 @@ static struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p) return ctx; } +static unsigned int io_cqring_events(struct io_rings *rings) +{ + /* See comment at the top of this file */ + smp_rmb(); + return READ_ONCE(rings->cq.tail) - READ_ONCE(rings->cq.head); +} + static inline bool io_sequence_defer(struct io_ring_ctx *ctx, struct io_kiocb *req) { @@ -521,7 +529,7 @@ static void io_cqring_fill_event(struct io_ring_ctx *ctx, u64 ki_user_data, static void io_cqring_ev_posted(struct io_ring_ctx *ctx) { if (waitqueue_active(&ctx->wait)) - wake_up(&ctx->wait); + wake_up_threshold(&ctx->wait, io_cqring_events(ctx->rings)); if (waitqueue_active(&ctx->sqo_wait)) wake_up(&ctx->sqo_wait); if (ctx->cq_ev_fd) @@ -546,7 +554,7 @@ static void io_ring_drop_ctx_refs(struct io_ring_ctx *ctx, unsigned refs) percpu_ref_put_many(&ctx->refs, refs); if (waitqueue_active(&ctx->wait)) - wake_up(&ctx->wait); + wake_up_threshold(&ctx->wait, io_cqring_events(ctx->rings)); } static struct io_kiocb *io_get_req(struct io_ring_ctx *ctx, @@ -681,12 +689,6 @@ static void io_put_req(struct io_kiocb *req) io_free_req(req); } -static unsigned io_cqring_events(struct io_rings *rings) -{ - /* See comment at the top of this file */ - smp_rmb(); - return READ_ONCE(rings->cq.tail) - READ_ONCE(rings->cq.head); -} /* * Find and free completed poll iocbs @@ -2591,7 +2593,8 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events, return ret; } - ret = wait_event_interruptible(ctx->wait, io_cqring_events(rings) >= min_events); + ret = wait_threshold_interruptible(ctx->wait, min_events, + io_cqring_events(rings)); restore_saved_sigmask_unless(ret == -ERESTARTSYS); if (ret == -ERESTARTSYS) ret = -EINTR; -- 2.22.0