Hi Christoph, with your minimal fix On Tue, 2022-11-08 at 08:48 +0100, Christoph Hellwig wrote: > Below is the minimal fix. I'll see if I sort out the mess that is > probe/reset failure vs ->remove a bit better, though. > > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c > index f94b05c585cbc..577bacdcfee08 100644 > --- a/drivers/nvme/host/core.c > +++ b/drivers/nvme/host/core.c > @@ -5160,6 +5160,8 @@ EXPORT_SYMBOL_GPL(nvme_start_freeze); > > void nvme_stop_queues(struct nvme_ctrl *ctrl) > { > + if (!ctrl->tagset) > + return; > if (!test_and_set_bit(NVME_CTRL_STOPPED, &ctrl->flags)) > blk_mq_quiesce_tagset(ctrl->tagset); > else > @@ -5169,6 +5171,8 @@ EXPORT_SYMBOL_GPL(nvme_stop_queues); > > void nvme_start_queues(struct nvme_ctrl *ctrl) > { > + if (!ctrl->tagset) > + return; > if (test_and_clear_bit(NVME_CTRL_STOPPED, &ctrl->flags)) > blk_mq_unquiesce_tagset(ctrl->tagset); > } on next-20221108 the kernel does not crash any more when I run the short test-script. dmesg shows: Nov 08 17:38:51 a46lp24.lnxne.boe kernel: nvme nvme0: pci function 0004:00:00.0 Nov 08 17:38:51 a46lp24.lnxne.boe kernel: nvme nvme0: failed to mark controller CONNECTING Nov 08 17:38:51 a46lp24.lnxne.boe kernel: nvme nvme0: Removing after probe failure status: -16 Nov 08 17:38:52 a46lp24.lnxne.boe kernel: pci 0004:00:00.0: Removing from iommu group 0 while kernel remains up. I can even do - rescan on the pci bus (to bring back the nvme drive), and - run the test script multiple times. So from my point of view this band-aid is valuable to be incorporated while the larger overhaul in https://lore.kernel.org/linux-nvme/20221108150252.2123727-1-hch@xxxxxx/ is out for review and test. Feel free to add my Tested-by: Gerd Bayer <gbayer@xxxxxxxxxxxxx> Thank you, Gerd Bayer