On 8/21/19 7:18 PM, Sagi Grimberg wrote: > >>> Hey, >>> >>> Just ran io-uring-bench on my VM to /dev/nullb0 and got the following >>> soft lockup [1], the reproducer is as simple as: >>> >>> modprobe null_blk >>> tools/io_uring/io_uring-bench /dev/nullb0 >>> >>> It looks like io_iopoll_getevents() can hog the cpu, however I don't >>> yet really know what is preventing it from quickly exceeding min and >>> punting back... >>> >>> Adding this makes the problem go away: >>> -- >>> diff --git a/fs/io_uring.c b/fs/io_uring.c >>> index 8b9dbf3b2298..aba03eee5c81 100644 >>> --- a/fs/io_uring.c >>> +++ b/fs/io_uring.c >>> @@ -779,6 +779,7 @@ static int io_iopoll_getevents(struct io_ring_ctx >>> *ctx, unsigned int *nr_events, >>> return ret; >>> if (!min || *nr_events >= min) >>> return 0; >>> + cond_resched(); >>> } >>> >>> return 1; >>> -- >>> >>> But I do not know if this is the correct way to fix this, or what >>> exactly is the issue, but thought I send it out given its so >>> easy to reproduce. >> >> I wonder what your .config is, can you attach it? > > Attached. > >> >> Also, please try my for-linus branch, it's got a few tweaks for how >> we handle polling (and when we back off). Doesn't affect the inner >> loop, so might not change anything for you. > > This is your for-linus branch (or at least the one when I sent you > the nvme pull request this week). > > The head commit on fs/io_uring.c: > 2fc7323f1c4b io_uring: fix potential hang with polled IO > > I'm only missing: > a3a0e43fd770 io_uring: don't enter poll loop if we have CQEs pending > > But that does not indicate that it addresses such an issue. > > I can still give it a shot if you think it can be resolved... > >> If not, might be better to have a need_resched() terminator in there, >> like we have in the outer loop. > > I can easily modify that, would like to understand what is preventing > the stop condition from happening though... I'm guessing because we need to free that same CPU to process the softirq that's actually completing them. null_blk is a bit special in that regard. The key in your case is that you have voluntary preempt set, so it'll never get to do that unless we yield on our own. Can you try this? diff --git a/fs/io_uring.c b/fs/io_uring.c index e7a43a354d91..c6a722996d8a 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -778,7 +778,7 @@ static int io_do_iopoll(struct io_ring_ctx *ctx, unsigned int *nr_events, static int io_iopoll_getevents(struct io_ring_ctx *ctx, unsigned int *nr_events, long min) { - while (!list_empty(&ctx->poll_list)) { + while (!list_empty(&ctx->poll_list) && !need_resched()) { int ret; ret = io_do_iopoll(ctx, nr_events, min); -- Jens Axboe