On Thu, Aug 25, 2022 at 12:12 PM Jens Axboe <axboe@xxxxxxxxx> wrote: > > On 8/25/22 10:47 AM, Song Liu wrote: > > On Tue, Aug 23, 2022 at 10:13 AM Song Liu <song@xxxxxxxxxx> wrote: > >> > >> On Mon, Aug 22, 2022 at 8:15 PM Thomas Deutschmann <whissi@xxxxxxxxx> wrote: > >>> > >>> On 2022-08-23 03:37, Song Liu wrote: > >>>> Thomas, have you tried to bisect with the fio repro? > >>> > >>> Yes, just finished: > >>> > >>>> d32d3d0b47f7e34560ae3c55ddfcf68694813501 is the first bad commit > >>>> commit d32d3d0b47f7e34560ae3c55ddfcf68694813501 > >>>> Author: Christoph Hellwig > >>>> Date: Mon Jun 14 13:17:34 2021 +0200 > >>>> > >>>> nvme-multipath: set QUEUE_FLAG_NOWAIT > >>>> > >>>> The nvme multipathing code just dispatches bios to one of the blk-mq > >>>> based paths and never blocks on its own, so set QUEUE_FLAG_NOWAIT > >>>> to support REQ_NOWAIT bios. > >>> > >>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d32d3d0b47f7e34560ae3c55ddfcf68694813501 > >>> > >>> > >>> So another NOWAIT issue -- similar to the bad commit which is causing > >>> the mdraid issue I already found > >>> (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0f9650bd838efe5c52f7e5f40c3204ad59f1964d). > >>> > >>> Reverting the commit, i.e. deleting > >>> > >>> blk_queue_flag_set(QUEUE_FLAG_NOWAIT, head->disk->queue); > >>> > >>> fixes the problem for me. Well, sort of. Looks like this will disable > >>> io_uring. fio reproducer fails with > >> > >> My system doesn't have multipath enabled. I guess bisect will point to something > >> else here. > >> > >> I am afraid we won't get more information from bisect. > > > > OK, I am able to pinpoint the issue, and Jens found the proper fix for > > it (see below, > > also available in [1]). It survived 100 runs of the repro fio job. > > > > Thomas, please give it a try. > > > > Thanks, > > Song > > > > diff --git c/fs/io_uring.c w/fs/io_uring.c > > index 3f8a79a4affa..72a39f5ec5a5 100644 > > --- c/fs/io_uring.c > > +++ w/fs/io_uring.c > > @@ -4551,7 +4551,12 @@ static int io_write(struct io_kiocb *req, > > unsigned int issue_flags) > > copy_iov: > > iov_iter_restore(&s->iter, &s->iter_state); > > ret = io_setup_async_rw(req, iovec, s, false); > > - return ret ?: -EAGAIN; > > + if (!ret) { > > + if (kiocb->ki_flags & IOCB_WRITE) > > + kiocb_end_write(req); > > + return -EAGAIN; > > + } > > + return 0; > > This should be 'return ret;' for that last line. I had to double check > the ones I did, but they did get it right. But I did a double take when > I saw this one :-) Ah, right... "ret ?: -EAGAIN" is a lot of information.. Song > > It'll work fine for testing as we won't hit errors here unless we run > out of memory, so... > > -- > Jens Axboe