On 13/08/2020 02:32, Jens Axboe wrote: > On 8/12/20 12:28 PM, Pavel Begunkov wrote: >> On 12/08/2020 21:22, Pavel Begunkov wrote: >>> On 12/08/2020 21:20, Pavel Begunkov wrote: >>>> On 12/08/2020 21:05, Jens Axboe wrote: >>>>> On 8/12/20 11:58 AM, Josef wrote: >>>>>> Hi, >>>>>> >>>>>> I have a weird issue on kernel 5.8.0/5.8.1, SIGINT even SIGKILL >>>>>> doesn't work to kill this process(always state D or D+), literally I >>>>>> have to terminate my VM because even the kernel can't kill the process >>>>>> and no issue on 5.7.12-201, however if IOSQE_IO_LINK is not set, it >>>>>> works >>>>>> >>>>>> I've attached a file to reproduce it >>>>>> or here >>>>>> https://gist.github.com/1Jo1/15cb3c63439d0c08e3589cfa98418b2c >>>>> >>>>> Thanks, I'll take a look at this. It's stuck in uninterruptible >>>>> state, which is why you can't kill it. >>>> >>>> It looks like one of the hangs I've been talking about a few days ago, >>>> an accept is inflight but can't be found by cancel_files() because it's >>>> in a link. >>> >>> BTW, I described it a month ago, there were more details. >> >> https://lore.kernel.org/io-uring/34eb5e5a-8d37-0cae-be6c-c6ac4d85b5d4@xxxxxxxxx > > Yeah I think you're right. How about something like the below? That'll > potentially cancel more than just the one we're looking for, but seems > kind of silly to only cancel from the file table holding request and to > the end. The bug is not poll/t-out related, IIRC my test reproduces it with read(pipe)->open(). See the previously sent link. As mentioned, I'm going to patch that up, if you won't beat me on that. > > > diff --git a/fs/io_uring.c b/fs/io_uring.c > index 8a2afd8c33c9..0630a9622baa 100644 > --- a/fs/io_uring.c > +++ b/fs/io_uring.c > @@ -4937,6 +5003,7 @@ static bool io_poll_remove_one(struct io_kiocb *req) > io_cqring_fill_event(req, -ECANCELED); > io_commit_cqring(req->ctx); > req->flags |= REQ_F_COMP_LOCKED; > + req_set_fail_links(req); > io_put_req(req); > } > > @@ -7935,6 +8002,47 @@ static bool io_wq_files_match(struct io_wq_work *work, void *data) > return work->files == files; > } > > +static bool __io_poll_remove_link(struct io_kiocb *preq, struct io_kiocb *req) > +{ > + struct io_kiocb *link; > + > + if (!(preq->flags & REQ_F_LINK_HEAD)) > + return false; > + > + list_for_each_entry(link, &preq->link_list, link_list) { > + if (link != req) > + break; > + io_poll_remove_one(preq); > + return true; > + } > + > + return false; > +} > + > +/* > + * We're looking to cancel 'req' because it's holding on to our files, but > + * 'req' could be a link to another request. See if it is, and cancel that > + * parent request if so. > + */ > +static void io_poll_remove_link(struct io_ring_ctx *ctx, struct io_kiocb *req) > +{ > + struct hlist_node *tmp; > + struct io_kiocb *preq; > + int i; > + > + spin_lock_irq(&ctx->completion_lock); > + for (i = 0; i < (1U << ctx->cancel_hash_bits); i++) { > + struct hlist_head *list; > + > + list = &ctx->cancel_hash[i]; > + hlist_for_each_entry_safe(preq, tmp, list, hash_node) { > + if (__io_poll_remove_link(preq, req)) > + break; > + } > + } > + spin_unlock_irq(&ctx->completion_lock); > +} > + > static void io_uring_cancel_files(struct io_ring_ctx *ctx, > struct files_struct *files) > { > @@ -7989,6 +8097,8 @@ static void io_uring_cancel_files(struct io_ring_ctx *ctx, > } > } else { > io_wq_cancel_work(ctx->io_wq, &cancel_req->work); > + /* could be a link, check and remove if it is */ > + io_poll_remove_link(ctx, cancel_req); > io_put_req(cancel_req); > } > > -- Pavel Begunkov