Re: [Bug 215525] New: HotPlug does not work on upstream kernel 5.17.0-rc1

Jonathan Derrick <jonathan.derrick@xxxxxxxxx> · Thu, 27 Jan 2022 15:31:02 -0700

On 1/27/2022 7:46 AM, Mariusz Tkaczyk wrote:
On Mon, 24 Jan 2022 15:46:35 -0600
Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:

[+cc linux-pci, Hans, Lukas, Naveen, Keith, Nirmal, Jonathan]

On Mon, Jan 24, 2022 at 11:46:14AM +0000,
bugzilla-daemon@xxxxxxxxxxxxxxxxxxx wrote:
https://bugzilla.kernel.org/show_bug.cgi?id=215525

             Bug ID: 215525
            Summary: HotPlug does not work on upstream kernel
5.17.0-rc1 Product: Drivers
            Version: 2.5
     Kernel Version: 5.17.0-rc1 upstream
           Hardware: x86-64
                 OS: Linux
               Tree: Mainline
             Status: NEW
           Severity: normal
           Priority: P1
          Component: PCI
           Assignee: drivers_pci@xxxxxxxxxxxxxxxxxxxx
           Reporter: blazej.kucman@xxxxxxxxx
         Regression: No

Created attachment 300308
   -->
https://bugzilla.kernel.org/attachment.cgi?id=300308&action=edit
dmesg

While testing on latest upstream
kernel(https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/)
we noticed that with the merge commit
(https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d0a231f01e5b25bacd23e6edc7c979a18a517b2b)
hotplug and hotunplug of nvme drives stopped working.

Rescan PCI does not help.
echo "1" > /sys/bus/pci/rescan

Issue does not reproduce on a kernel built on an antecedent
commit(88db8458086b1dcf20b56682504bdb34d2bca0e2).

During hot-remove device does not disappear, however when we try to
do I/O on the disk then there is an I/O error, and the device
disappears.

Before I/O no logs regarding the disk appeared in the dmesg, only
after I/O the entries appeared like below:
[  177.943703] nvme nvme5: controller is down; will reset:
CSTS=0xffffffff, PCI_STATUS=0xffff
[  177.971661] nvme 10000:0b:00.0: can't change power state from
D3cold to D0 (config space inaccessible)
[  177.981121] pcieport 10000:00:02.0: can't derive routing for PCI
INT A [  177.987749] nvme 10000:0b:00.0: PCI INT A: no GSI
[  177.992633] nvme nvme5: Removing after probe failure status: -19
[  178.004633] nvme5n1: detected capacity change from 83984375 to 0
[  178.004677] I/O error, dev nvme5n1, sector 0 op 0x0:(READ) flags
0x0 phys_seg 1 prio class 0

OS: RHEL 8.4 GA
Platform: Intel Purley

The logs are collected on a non-recent upstream kernel, but a issue
also occurs on the newest upstream
kernel(dd81e1c7d5fb126e5fbc5c9e334d7b3ec29a16a0)

Apparently worked immediately before merging the PCI changes for
v5.17 and failed immediately after:

   good: 88db8458086b ("Merge tag 'exfat-for-5.17-rc1' of
git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat") bad:
  d0a231f01e5b ("Merge tag 'pci-v5.17-changes' of
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci")

Only three commits touch pciehp:

   085a9f43433f ("PCI: pciehp: Use down_read/write_nested(reset_lock)
to fix lockdep errors") 23584c1ed3e1 ("PCI: pciehp: Fix infinite loop
in IRQ handler upon power fault") a3b0f10db148 ("PCI: pciehp: Use
PCI_POSSIBLE_ERROR() to check config reads")

None seems obviously related to me.  Blazej, could you try setting
CONFIG_DYNAMIC_DEBUG=y and booting with 'dyndbg="file pciehp* +p"' to
enable more debug messages?

Hi Bjorn,

Thanks for your suggestions. Blazej did some tests and results were
inconclusive. He tested it on two same platforms. On the first one it
didn't work, even if he reverted all suggested patches. On the second
one hotplugs always worked.

He noticed that on first platform where issue has been found initally,
there was boot parameter "pci=nommconf". After adding this parameter
on the second platform, hotplugs stopped working too.

Tested on tag pci-v5.17-changes. He have CONFIG_HOTPLUG_PCI_PCIE
and CONFIG_DYNAMIC_DEBUG enabled in config. He also attached two dmesg
logs to bugzilla with boot parameter 'dyndbg="file pciehp* +p" as
requested. One with "pci=nommconf" and one without.

Issue seems to related to "pci=nommconf" and it is probably caused
by change outside pciehp.

Could it be related to this?

int raw_pci_read(unsigned int domain, unsigned int bus, unsigned int devfn, int reg, int len, u32 *val)
{
	if (domain == 0 && reg < 256 && raw_pci_ops)
		return raw_pci_ops->read(domain, bus, devfn, reg, len, val);
	if (raw_pci_ext_ops)
		return raw_pci_ext_ops->read(domain, bus, devfn, reg, len, val);
	return -EINVAL;
}

It looks like raw_pci_ext_ops won't be set with nommconf, and VMD subdevice domain will be > 0.

He is currently working on email client setup to answer himself.

Thanks,
Mariusz