On 02/02/2021 19:52, Hao Xu wrote: > This might happen if we do epoll_wait on a uring fd while reading/writing > the former epoll fd in a sqe in the former uring instance. > So let's don't flush cqring overflow list when we fail to get the uring > lock. This leads to less accuracy, but is still ok. if (io_cqring_events(ctx) || test_bit(0, &ctx->cq_check_overflow)) mask |= EPOLLIN | EPOLLRDNORM; Instead of flushing. It'd make sense if we define poll as "there might be something, go do your peek/wait with overflow checks". Jens, is that documented anywhere? > > Reported-by: Abaci <abaci@xxxxxxxxxxxxxxxxx> > Fixes: 6c503150ae33 ("io_uring: patch up IOPOLL overflow_flush sync") > Signed-off-by: Hao Xu <haoxu@xxxxxxxxxxxxxxxxx> > --- > > Here I use mutex_trylock() to fix this issue, but this causes loss of > accuracy. I think doing cqring overflow flush in a task work maybe a > better solution. I'm think of this. Any thoughts? > > fs/io_uring.c | 31 ++++++++++++++++++++++++++++++- > 1 file changed, 30 insertions(+), 1 deletion(-) > > diff --git a/fs/io_uring.c b/fs/io_uring.c > index 38c6cbe1ab38..866e45d42ac7 100644 > --- a/fs/io_uring.c > +++ b/fs/io_uring.c > @@ -8718,7 +8718,36 @@ static __poll_t io_uring_poll(struct file *file, poll_table *wait) > smp_rmb(); > if (!io_sqring_full(ctx)) > mask |= EPOLLOUT | EPOLLWRNORM; > - io_cqring_overflow_flush(ctx, false, NULL, NULL); > + > + if (test_bit(0, &ctx->cq_check_overflow)) { > + bool should_flush = true; > + /* iopoll syncs against uring_lock, not completion_lock */ > + if (ctx->flags & IORING_SETUP_IOPOLL) { > + /* > + * avoid ABBA deadlock. > + * there could be contention like below: > + * CPU0 CPU1 > + * ---- ---- > + * lock(&ctx->uring_lock); > + * lock(&ep->mtx); > + * lock(&ctx->uring_lock); > + * lock(&ep->mtx); > + * > + * this might happen if we do epoll_wait on a uring fd while > + * read/write the former epoll fd in a sqe in the former uring > + * instance. > + * We don't flush cqring overflow list when we fail to get the > + * uring lock. This leads to less accuracy, but is still ok. > + */ > + should_flush = mutex_trylock(&ctx->uring_lock); > + } > + if (should_flush) { > + __io_cqring_overflow_flush(ctx, false, NULL, NULL); > + if (ctx->flags & IORING_SETUP_IOPOLL) > + mutex_unlock(&ctx->uring_lock); > + } > + } > + > if (io_cqring_events(ctx)) > mask |= EPOLLIN | EPOLLRDNORM; > > -- Pavel Begunkov