On Thu, May 13, 2021 at 07:50:26PM -0700, Keith Busch wrote: > On Wed, May 12, 2021 at 03:12:37PM -0700, Keith Busch wrote: > > On Wed, May 12, 2021 at 03:03:40PM -0700, Sagi Grimberg wrote: > > > On 5/12/21 6:15 AM, Christoph Hellwig wrote: > > > > + * > > > > + * Note: the caller must either be the context that submitted @bio, or > > > > + * be in a RCU critical section to prevent freeing of @bio. > > > > + */ > > > > +int bio_poll(struct bio *bio, unsigned int flags) > > > > +{ > > > > + struct request_queue *q = bio->bi_bdev->bd_disk->queue; > > > > + blk_qc_t cookie = READ_ONCE(bio->bi_cookie); > > > > + > > > > + if (cookie == BLK_QC_T_NONE || > > > > + !test_bit(QUEUE_FLAG_POLL, &q->queue_flags)) > > > > + return 0; > > > > + > > > > + if (current->plug) > > > > + blk_flush_plug_list(current->plug, false); > > > > + > > > > + /* not yet implemented, so this should not happen */ > > > > + if (WARN_ON_ONCE(!queue_is_mq(q))) > > > > > > What happens if the I/O wasn't (yet) queued to the bottom device > > > (i.e. no available path in nvme-mpath)? > > > > > > In this case the disk would be the mpath device node (which is > > > not mq...) > > > > The bi_cookie should remain BLK_QC_T_NONE in that case, so we wouldn't > > get to the warning. But if that does happen, it doesn't appear that > > anyone is going to wake up thread that needs to poll for this bio's > > completion when a path becomes available for dispatch. I think it would > > make sense for nvme-mpath to just clear the REQ_POLLED flag if it > > doesn't immediately have viable path. > > It looks like there is a way to hit this warning. If you induce a path > error to create a failover, the nvme-mpath will reset the bio's bdev to > the mpath one, which is not MQ. I can occaisionally hit it with a fault > injection test. Oh, and it gets a bit worse. Using pvsync2 will block forever once this warning is hit, or if nvme-mpath had to requeue the IO for lack of a path. io_uring doesn't have that hang problem though. Christoph, I think patch 15 needs something like this: --- diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c index 7febdb57f690..d40a9331daf7 100644 --- a/drivers/nvme/host/multipath.c +++ b/drivers/nvme/host/multipath.c @@ -419,6 +419,11 @@ static void nvme_requeue_work(struct work_struct *work) * path. */ bio_set_dev(bio, head->disk->part0); + + if (bio->bi_opf & REQ_POLLED) { + bio->bi_opf &= ~REQ_POLLED; + bio->bi_cookie = BLK_QC_T_NONE; + } submit_bio_noacct(bio); } } -- This should fix the hang since requeued bio's will use an interrupt driven queue, but it doesn't fix the warning. The recent commit "nvme-multipath: reset bdev to ns head when failover" looks like it makes preventing the polling thread from using the non-MQ head disk not possible.