I've tracked down the problem to pciehp_ist() deadlocking on nvme removal. I blame this on nvme bugs. #$ ls /sys/bus/pci/devices/0000\:b0\:04.0/msi_irqs/ 44 # ps aux |grep pcie root 529 0.0 0.0 0 0 ? D 11:04 0:00 [irq/44-pciehp] # cat /proc/529/stack [<0>] __flush_work+0x11a/0x1d0 [<0>] __cancel_work_timer+0x10a/0x190 [<0>] nvme_remove+0x38/0x170 [nvme] [<0>] pci_device_remove+0x3b/0xc0 [<0>] device_release_driver_internal+0x180/0x240 [<0>] pci_stop_bus_device+0x69/0x90 [<0>] pci_stop_and_remove_bus_device+0xe/0x20 [<0>] pciehp_unconfigure_device+0x84/0x140 [<0>] pciehp_disable_slot+0x67/0x110 [<0>] pciehp_handle_presence_or_link_change+0x25f/0x400 [<0>] pciehp_ist+0x22b/0x230 [<0>] irq_thread_fn+0x1f/0x50 [<0>] irq_thread+0xf3/0x190 [<0>] kthread+0x112/0x130 [<0>] ret_from_fork+0x35/0x40 [<0>] 0xffffffffffffffff This is about two hours after the drive was removed, give or take a timezone. <abridged_obligatory_frustrated_developer_rant> This took about two damn weeks to track down. Where do I have to go to collect the coffee rations? <\abridged_obligatory_frustrated_developer_rant> Alex On 12/18/2018 04:28 PM, Gagniuc, Alexandru - Dell Team wrote: > Hi, > > I've been seeing two issues with this commit. One was a NULL dereference > that appears to have been fixed in 4.20-rc7. > > For the second issue, I've included a system log in Appendix B with an > explanation of what events are being generated -- all done with a > hardware yanker, not ACPI or fake injection techniques. > > After the second or so removal, the kernel will think the device is > gone, and fails to recognize hotplug interrupts that bring the device > back online. The hardware continues to send hotplug interrupts, but the > handler doesn't want to recognize them -- look for "pciehp_isr hit" > message in the logs. > > I'm seeing this issue starting with commit > cb4bfda62afa "nvme-pci: fix hot removal during error handling" > I can't reproduce it with the commit right before it. > > I'm not sure if this is a PCI or NVME bug, and it is baffling as to what > the heck is going on. Even if I revert the commit from v4.20-rc7, I > continue to see the hotplug interrupt misfire. I suspect it's the > combination of several changes that breaks things. > > Who let the bugs out? > > Alex > > > Appendix B: System log with explanations > > [ 133.772812] IPv6: ADDRCONF(NETDEV_CHANGE): eno1: link becomes ready > > **** Lanes 0-2 are disconnected here > > [ 599.231003] pcieport 0000:b0:04.0: pciehp_isr hit > [ 599.231067] pciehp 0000:b0:04.0:pcie204: Slot(178): Link Down > [ 599.272506] pcieport 0000:b0:04.0: pciehp_isr hit > [ 599.299943] pciehp 0000:b0:04.0:pcie204: Slot(178): Card present > [ 599.299947] pciehp 0000:b0:04.0:pcie204: Slot(178): Link Up > [ 599.605523] pci 0000:b1:00.0: [8086:0a55] type 00 class 0x010802 > [ 599.605565] pci 0000:b1:00.0: reg 0x10: [mem 0x00000000-0x00003fff 64bit] > [ 599.605612] pci 0000:b1:00.0: Max Payload Size set to 256 (was 128, > max 512) > [ 599.605618] pci 0000:b1:00.0: enabling Extended Tags > [ 599.605845] acpi_run_hpx: Type 3 record not supported > [ 599.605984] pci 0000:b1:00.0: 7.876 Gb/s available PCIe bandwidth, > limited by 8 GT/s x1 link at 0000:b0:04.0 (capable of 31.504 Gb/s with 8 > GT/s x4 link) > [ 599.606356] pcieport 0000:b0:04.0: bridge window [io 0x1000-0x0fff] > to [bus b1] add_size 1000 > [ 599.606364] pcieport 0000:b0:04.0: BAR 13: no space for [io size 0x1000] > [ 599.606368] pcieport 0000:b0:04.0: BAR 13: failed to assign [io size > 0x1000] > [ 599.606371] pcieport 0000:b0:04.0: BAR 13: no space for [io size 0x1000] > [ 599.606375] pcieport 0000:b0:04.0: BAR 13: failed to assign [io size > 0x1000] > [ 599.606382] pci 0000:b1:00.0: BAR 0: assigned [mem > 0xe1500000-0xe1503fff 64bit] > [ 599.606426] pcieport 0000:b0:04.0: PCI bridge to [bus b1] > [ 599.606434] pcieport 0000:b0:04.0: bridge window [mem > 0xe1500000-0xe15fffff] > [ 599.606440] pcieport 0000:b0:04.0: bridge window [mem > 0x386000000000-0x3860001fffff 64bit pref] > [ 599.608379] nvme nvme4: pci function 0000:b1:00.0 > [ 599.608680] nvme 0000:b1:00.0: enabling device (0000 -> 0002) > [ 603.386826] nvme1n2: p1 > [ 603.475185] md: super_written gets error=10 > [ 603.475192] md/raid1:mdX: Disk failure on dm-13, disabling device. > md/raid1:mdX: Operation continuing on 2 devices. > [ 603.519511] device-mapper: raid: Faulty raid1 device #1 has readable > super block. Attempting to revive it. > > *** Lanes 0-2 are reconnected > > [ 603.548495] md: recovery of RAID array mdX > > *** All PCIe lanes are disconnected here > > [ 623.162103] pcieport 0000:b0:04.0: pciehp_isr hit > [ 623.162169] pciehp 0000:b0:04.0:pcie204: Slot(178): Link Down > [ 623.168534] pcieport 0000:b0:04.0: pciehp_isr hit > [ 623.185774] print_req_error: I/O error, dev nvme1n2, sector 7860864 > [ 623.185812] print_req_error: I/O error, dev nvme1n2, sector 7861248 > [ 623.185817] print_req_error: I/O error, dev nvme1n2, sector 7857920 > [ 623.185822] print_req_error: I/O error, dev nvme1n2, sector 7858432 > [ 623.185826] print_req_error: I/O error, dev nvme1n2, sector 7858560 > [ 623.185830] print_req_error: I/O error, dev nvme1n2, sector 7862144 > [ 623.185835] print_req_error: I/O error, dev nvme1n2, sector 7858944 > [ 623.185870] md/raid1:mdX: Disk failure on dm-13, disabling device. > md/raid1:mdX: Operation continuing on 2 devices. > [ 623.185873] md: mdX: recovery interrupted. > [ 623.185881] print_req_error: I/O error, dev nvme1n2, sector 7861376 > [ 623.197723] print_req_error: I/O error, dev nvme1n2, sector 7859072 > [ 623.197727] print_req_error: I/O error, dev nvme1n2, sector 7862528 > > *** All PCIe lanes are re-connected > > [ 628.842766] pcieport 0000:b0:04.0: pciehp_isr hit > [ 628.842829] pciehp 0000:b0:04.0:pcie204: Slot(178): Link Up > [ 629.147089] pci 0000:b1:00.0: [8086:0a55] type 00 class 0x010802 > [ 629.147130] pci 0000:b1:00.0: reg 0x10: [mem 0x00000000-0x00003fff 64bit] > [ 629.147177] pci 0000:b1:00.0: Max Payload Size set to 256 (was 128, > max 512) > [ 629.147182] pci 0000:b1:00.0: enabling Extended Tags > [ 629.147401] acpi_run_hpx: Type 3 record not supported > [ 629.147888] pcieport 0000:b0:04.0: bridge window [io 0x1000-0x0fff] > to [bus b1] add_size 1000 > [ 629.147896] pcieport 0000:b0:04.0: BAR 13: no space for [io size 0x1000] > [ 629.147899] pcieport 0000:b0:04.0: BAR 13: failed to assign [io size > 0x1000] > [ 629.147903] pcieport 0000:b0:04.0: BAR 13: no space for [io size 0x1000] > [ 629.147906] pcieport 0000:b0:04.0: BAR 13: failed to assign [io size > 0x1000] > [ 629.147913] pci 0000:b1:00.0: BAR 0: assigned [mem > 0xe1500000-0xe1503fff 64bit] > [ 629.147926] pcieport 0000:b0:04.0: PCI bridge to [bus b1] > [ 629.147933] pcieport 0000:b0:04.0: bridge window [mem > 0xe1500000-0xe15fffff] > [ 629.147939] pcieport 0000:b0:04.0: bridge window [mem > 0x386000000000-0x3860001fffff 64bit pref] > [ 629.148891] nvme nvme2: pci function 0000:b1:00.0 > [ 629.148932] nvme 0000:b1:00.0: enabling device (0000 -> 0002) > [ 629.239386] pcieport 0000:b0:04.0: pciehp_isr hit > [ 629.239420] pciehp 0000:b0:04.0:pcie204: Slot(178): Card not present > > *** Lanes not disconnected again until here > > [ 635.951171] pcieport 0000:b0:04.0: pciehp_isr hit > [ 635.992718] pcieport 0000:b0:04.0: pciehp_isr hit > [ 659.866526] pcieport 0000:b0:04.0: pciehp_isr hit > [ 659.877189] pcieport 0000:b0:04.0: pciehp_isr hit > [ 665.555242] pcieport 0000:b0:04.0: pciehp_isr hit >