On Fri, 28 Jan 2022 08:03:28 -0600 Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote: > On Fri, Jan 28, 2022 at 09:49:34PM +0800, Kai-Heng Feng wrote: > > On Fri, Jan 28, 2022 at 9:08 PM Bjorn Helgaas <helgaas@xxxxxxxxxx> > > wrote: > > > On Fri, Jan 28, 2022 at 09:29:31AM +0100, Mariusz Tkaczyk wrote: > > > > On Thu, 27 Jan 2022 20:52:12 -0600 > > > > Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote: > > > > > On Thu, Jan 27, 2022 at 03:46:15PM +0100, Mariusz Tkaczyk > > > > > wrote: > > > > > > ... > > > > > > Thanks for your suggestions. Blazej did some tests and > > > > > > results were inconclusive. He tested it on two same > > > > > > platforms. On the first one it didn't work, even if he > > > > > > reverted all suggested patches. On the second one hotplugs > > > > > > always worked. > > > > > > > > > > > > He noticed that on first platform where issue has been found > > > > > > initally, there was boot parameter "pci=nommconf". After > > > > > > adding this parameter on the second platform, hotplugs > > > > > > stopped working too. > > > > > > > > > > > > Tested on tag pci-v5.17-changes. He have > > > > > > CONFIG_HOTPLUG_PCI_PCIE and CONFIG_DYNAMIC_DEBUG enabled in > > > > > > config. He also attached two dmesg logs to bugzilla with > > > > > > boot parameter 'dyndbg="file pciehp* +p" as requested. One > > > > > > with "pci=nommconf" and one without. > > > > > > > > > > > > Issue seems to related to "pci=nommconf" and it is probably > > > > > > caused by change outside pciehp. > > > > > > > > > > Maybe I'm missing something. If I understand correctly, the > > > > > problem has nothing to do with the kernel version (correct me > > > > > if I'm wrong!) > > > > > > > > The problem occurred after the merge commit. It is some kind of > > > > regression. > > > > > > The bug report doesn't yet contain the evidence showing this. It > > > only contains dmesg logs with "pci=nommconf" where pciehp doesn't > > > work (which is the expected behavior) and a log without > > > "pci=nommconf" where pciehp does work (which is again the > > > expected behavior). > > > > > PCIe native hotplug doesn't work when booted with > > > > > "pci=nommconf". When using "pci=nommconf", obviously we can't > > > > > access the extended PCI config space (offset 0x100-0xfff), so > > > > > none of the extended capabilities are available. > > > > > > > > > > In that case, we don't even ask the platform for control of > > > > > PCIe hotplug via _OSC. From the dmesg diff from normal > > > > > (working) to "pci=nommconf" (not working): > > > > > > > > > > -Command line: BOOT_IMAGE=/boot/vmlinuz-smp ... > > > > > +Command line: BOOT_IMAGE=/boot/vmlinuz-smp pci=nommconf ... > > > > > ... > > > > > -acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig ASPM > > > > > ClockPM Segments MSI HPX-Type3] -acpi PNP0A08:00: _OSC: > > > > > platform does not support [AER LTR] -acpi PNP0A08:00: _OSC: > > > > > OS now controls [PCIeHotplug PME PCIeCapability] +acpi > > > > > PNP0A08:00: _OSC: OS supports [ASPM ClockPM Segments MSI > > > > > HPX-Type3] +acpi PNP0A08:00: _OSC: not requesting OS control; > > > > > OS requires [ExtendedConfig ASPM ClockPM MSI] +acpi > > > > > PNP0A08:00: MMCONFIG is disabled, can't access extended PCI > > > > > configuration space under this bridge. > > > > > > > > So, it shouldn't work from years but it has been broken > > > > recently, that is the only objection I have. Could you tell why > > > > it was working? According to your words- it shouldn't. We are > > > > using VMD driver, is that matter? > > > > > > 04b12ef163d1 ("PCI: vmd: Honor ACPI _OSC on PCIe features") looks > > > like a it could be related. Try reverting that commit and see > > > whether it makes a difference. > > > > The affected NVMe is indeed behind VMD domain, so I think the commit > > can make a difference. > > > > Does VMD behave differently on laptops and servers? > > Anyway, I agree that the issue really lies in "pci=nommconf". > > Oh, I have a guess: > > - With "pci=nommconf", prior to v5.17-rc1, pciehp did not work in > general, but *did* work for NVMe behind a VMD. As of v5.17-rc1, > pciehp no longer works for NVMe behind VMD. > > - Without "pci=nommconf", pciehp works as expected for all devices > including NVMe behind VMD, both before and after v5.17-rc1. > > Is that what you're observing? > > If so, I doubt there's anything to fix other than getting rid of > "pci=nommconf". > > Bjorn I haven't tested with VMD disabled earlier. I verified it and my observations are as follows: OS: RHEL 8.4 NO - hotplug not working YES - hotplug working pci=nommconf added: +--------------+-------------------+---------------------+--------------+ | | pci-v5.17-changes | revert-04b12ef163d1 | inbox kernel +--------------+-------------------+---------------------+--------------+ | VMD enabled | NO | YES | YES +--------------+-------------------+---------------------+--------------+ | VMD disabled | NO | NO | NO +--------------+-------------------+---------------------+--------------+ without pci=nommconf: +--------------+-------------------+---------------------+--------------+ | | pci-v5.17-changes | revert-04b12ef163d1 | inbox kernel +--------------+-------------------+---------------------+--------------+ | VMD enabled | YES | YES | YES +--------------+-------------------+---------------------+--------------+ | VMD disabled | YES | YES | YES +--------------+-------------------+---------------------+--------------+ So, results confirmed your assumptions, but I also confirmed that revert of 04b12ef163d1 ("PCI: vmd: Honor ACPI _OSC on PCIe features") makes it to work as in inbox kernel. We will drop the legacy parameter in our tests. According to my results there is a regression in VMD caused by: 04b12ef163d1 commit, even if it is not working for nvme anyway. Should it be fixed? Thanks, Blazej