If we register an eventfd with io_uring for completion notification, and then subsequently does a poll for that very descriptor, then we can trigger a deadlock scenario. Once a request completes and signals the eventfd context, that will in turn trigger the poll command to complete. When that poll request completes, it'll try trigger another event on the eventfd, but this time off the path led us to complete the poll in the first place. The result is a deadlock in eventfd, as it tries to ctx->wqh.lock in a nested fashion. Check if the file in question for the poll request is our eventfd context, and if it is, don't trigger a nested event for the poll completion. Cc: stable@xxxxxxxxxxxxxxx Signed-off-by: Jens Axboe <axboe@xxxxxxxxx> --- We're holding the file here, so checking ->private_data is fine. It's a bit iffy in that it implies knowledge that this is where eventfd stores the ctx. We could potentially make that helper in eventfd, if needed. diff --git a/fs/io_uring.c b/fs/io_uring.c index ac5340fdcdfe..5788c2139c72 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -1025,16 +1025,21 @@ static inline bool io_should_trigger_evfd(struct io_ring_ctx *ctx) return io_wq_current_is_worker() || in_interrupt(); } -static void io_cqring_ev_posted(struct io_ring_ctx *ctx) +static void __io_cqring_ev_posted(struct io_ring_ctx *ctx, bool no_evfd) { if (waitqueue_active(&ctx->wait)) wake_up(&ctx->wait); if (waitqueue_active(&ctx->sqo_wait)) wake_up(&ctx->sqo_wait); - if (ctx->cq_ev_fd && io_should_trigger_evfd(ctx)) + if (!no_evfd && ctx->cq_ev_fd && io_should_trigger_evfd(ctx)) eventfd_signal(ctx->cq_ev_fd, 1); } +static void io_cqring_ev_posted(struct io_ring_ctx *ctx) +{ + __io_cqring_ev_posted(ctx, false); +} + /* Returns true if there are no backlogged entries after the flush */ static bool io_cqring_overflow_flush(struct io_ring_ctx *ctx, bool force) { @@ -3586,13 +3591,27 @@ static int io_poll_wake(struct wait_queue_entry *wait, unsigned mode, int sync, if (llist_empty(&ctx->poll_llist) && spin_trylock_irqsave(&ctx->completion_lock, flags)) { + bool no_evfd = false; + hash_del(&req->hash_node); io_poll_complete(req, mask, 0); req->flags |= REQ_F_COMP_LOCKED; + + /* + * If the file triggering the poll is our eventfd + * context, then don't trigger another event for this + * request. If we do, we can recurse, as this poll + * request may be triggered by a completion that + * signaled our eventfd. + */ + if (ctx->cq_ev_fd && + req->file->private_data == ctx->cq_ev_fd) + no_evfd = true; + io_put_req(req); spin_unlock_irqrestore(&ctx->completion_lock, flags); - io_cqring_ev_posted(ctx); + __io_cqring_ev_posted(ctx, no_evfd); req = NULL; } else { req->result = mask; -- Jens Axboe