https://bugzilla.kernel.org/show_bug.cgi?id=202055 --- Comment #10 from Alex Williamson (alex.williamson@xxxxxxxxxx) --- Ok, how about we try a secondary bus reset then. For testing purposes we're going to trigger a secondary bus reset outside of the control of the kernel, so the device state will not be restored after this. We can look at the PCI config space, but don't expect the device to work until the system is rebooted. To start we need to identify the upstream port for the device. My system will be different from yours, so extrapolate as needed: # lspci -tv | grep -i nvme +-1c.4-[04]----00.0 Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981 This shows my Samsung NVMe drive at 4:00.0 is attached to the root port at 00:1c.4, which is the bridge we'll be using to generate the reset. Replace with the device above your NVMe controller at 6:00.0. We can then read the bridge control register using: # setpci -s 00:1c.4 BRIDGE_CONTROL 0000 The bus reset procedure is to set the bus reset bit briefly, clear it, then wait for the bus to recover, therefore: # setpci -s 00:1c.4 BRIDGE_CONTROL=40:40; sleep 0.1; setpci -s 00:1c.4 BRIDGE_CONTROL=0:40; sleep 1 (don't forget to replace each occurrence of 00:1c.4 with the port the NVMe drive is attached in your system) >From here check the MSI-X Count of the NVMe device. It would be interesting to test starting with Count=16, binding to vfio-pci, if you replace the 'echo 1 > reset' with the above, what does Count report. And also, after resetting the system, put the device back into a state where it reports Count=22, then try the secondary bus reset above to see if it returns the device to Count=16. If this is a better reset method for this device we can implement a device specific reset in the kernel that does this rather than an FLR. -- You are receiving this mail because: You are watching the assignee of the bug.