On Mon, Mar 27, 2023 at 05:43:18PM +0000, Aleksander Trofimowicz wrote: > > Keith Busch <kbusch@xxxxxxxxxx> writes: > > > On Mon, Mar 27, 2023 at 09:33:59AM -0500, Bjorn Helgaas wrote: > >> Forwarding to NVMe folks, lists for visibility. > >> > >> ----- Forwarded message from bugzilla-daemon@xxxxxxxxxx ----- > >> > >> https://bugzilla.kernel.org/show_bug.cgi?id=217251 > >> ... > >> > >> Created attachment 304031 > >> --> https://bugzilla.kernel.org/attachment.cgi?id=304031&action=edit > >> the tracing of nvme_pci_enable() during re-insertion > >> > >> Hi, > >> > >> There is a JHL7540-based device that may host a NVMe device. After the first > >> insertion a nvme drive is properly discovered and handled by the relevant > >> modules. Once disconnected any further attempts are not successful. The device > >> is visible on a PCI bus, but nvme_pci_enable() ends up calling > >> pci_disable_device() every time; the runtime PM status of the device is > >> "suspended", the power status of the 04:01.0 PCI bridge is D3. Preventing the > >> device from being power managed ("on" -> /sys/devices/../power/control) > >> combined with device removal and pci rescan changes nothing. A host reboot > >> restores the initial state. > >> > >> I would appreciate any suggestions how to debug it further. > > > > Sounds the same as this report: > > > > http://lists.infradead.org/pipermail/linux-nvme/2023-March/038259.html > > > > The driver is bailing on the device because we can't read it's status register > > out of the remapped BAR. There's nothing we can do about that from the nvme > > driver level. Memory mapped IO has to work in order to proceed. > > > Thanks. I can confirm it is the same problem: > > a) the platform is Intel Alderlake > b) readl(dev->bar + NVME_REG_CSTS) in nvme_pci_enable() fails > c) reading BAR0 via setpci gives 0x00000004 It's strange too. In your example, kernel says: 0000:05:00.0: BAR 0: assigned [mem 0x54000000-0x54003fff 64bit] There is a check right after that message that ensures the kernel reads back what it wrote. No failures reported means the device really did have the expected BAR value at one point.