Re: [bugzilla-daemon@xxxxxxxxxx: [Bug 217251] New: pciehp: nvme not visible after re-insert to tbt port]

Aleksander Trofimowicz <alex@xxxxxx> · Mon, 27 Mar 2023 17:43:18 +0000

Keith Busch <kbusch@xxxxxxxxxx> writes:

> On Mon, Mar 27, 2023 at 09:33:59AM -0500, Bjorn Helgaas wrote:
>> Forwarding to NVMe folks, lists for visibility.
>>
>> ----- Forwarded message from bugzilla-daemon@xxxxxxxxxx -----
>>
>> https://bugzilla.kernel.org/show_bug.cgi?id=217251
>> ...
>>
>> Created attachment 304031
>>   --> https://bugzilla.kernel.org/attachment.cgi?id=304031&action=edit
>> the tracing of nvme_pci_enable() during re-insertion
>>
>> Hi,
>>
>> There is a JHL7540-based device that may host a NVMe device. After the first
>> insertion a nvme drive is properly discovered and handled by the relevant
>> modules. Once disconnected any further attempts are not successful. The device
>> is visible on a PCI bus, but nvme_pci_enable() ends up calling
>> pci_disable_device() every time; the runtime PM status of the device is
>> "suspended", the power status of the 04:01.0 PCI bridge is D3. Preventing the
>> device from being power managed ("on" -> /sys/devices/../power/control)
>> combined with device removal and pci rescan changes nothing. A host reboot
>> restores the initial state.
>>
>> I would appreciate any suggestions how to debug it further.
>
> Sounds the same as this report:
>
>   http://lists.infradead.org/pipermail/linux-nvme/2023-March/038259.html
>
> The driver is bailing on the device because we can't read it's status register
> out of the remapped BAR. There's nothing we can do about that from the nvme
> driver level. Memory mapped IO has to work in order to proceed.
>
Thanks. I can confirm it is the same problem:

a) the platform is Intel Alderlake
b) readl(dev->bar + NVME_REG_CSTS) in nvme_pci_enable() fails
c) reading BAR0 via setpci gives 0x00000004

--
at