On Tue, Feb 15, 2022 at 08:17:31PM +0100, Christoph Hellwig wrote: > On Mon, Feb 14, 2022 at 10:51:07AM +0100, Markus Blöchl wrote: > > After the surprise removal of a mounted NVMe disk the pciehp task > > reliably hangs forever with a trace similar to this one: > > Do you have a specific reproducer? At least with doing a > > echo 1 > /sys/.../remove > > while running fsx on a file system I can't actually reproduce it. That's a gracefull removal. You need to do something to terminate the connection without the driver knowing about it. If you don't have a hotplug capable system, you can do something slightly destructive to the PCI link to force an ungraceful teardown, though you'll need to wait for IO timeout before anything interesting will happen. # setpci -s "${slot}" CAP_EXP+10.w=10:10 The "$slot" needs to be the B:D.f of the bridge connecting to your nvme end device. An example getting it for a non-multipath PCIe attached nvme0n1: # readlink -f /sys/block/nvme0n1/device | grep -Eo '[0-9a-f]{4,5}:[0-9a-f]{2}:[0-9a-f]{2}\.[0-9a-f]' | tail -2 | head -1