Re: [Bug 215525] New: HotPlug does not work on upstream kernel 5.17.0-rc1

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 28 Jan 2022 08:03:28 -0600
Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:

> On Fri, Jan 28, 2022 at 09:49:34PM +0800, Kai-Heng Feng wrote:
> > On Fri, Jan 28, 2022 at 9:08 PM Bjorn Helgaas <helgaas@xxxxxxxxxx>
> > wrote:  
> > > On Fri, Jan 28, 2022 at 09:29:31AM +0100, Mariusz Tkaczyk wrote:  
> > > > On Thu, 27 Jan 2022 20:52:12 -0600
> > > > Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:  
> > > > > On Thu, Jan 27, 2022 at 03:46:15PM +0100, Mariusz Tkaczyk
> > > > > wrote:  
> > > > > > ...
> > > > > > Thanks for your suggestions. Blazej did some tests and
> > > > > > results were inconclusive. He tested it on two same
> > > > > > platforms. On the first one it didn't work, even if he
> > > > > > reverted all suggested patches. On the second one hotplugs
> > > > > > always worked.
> > > > > >
> > > > > > He noticed that on first platform where issue has been found
> > > > > > initally, there was boot parameter "pci=nommconf". After
> > > > > > adding this parameter on the second platform, hotplugs
> > > > > > stopped working too.
> > > > > >
> > > > > > Tested on tag pci-v5.17-changes. He have
> > > > > > CONFIG_HOTPLUG_PCI_PCIE and CONFIG_DYNAMIC_DEBUG enabled in
> > > > > > config. He also attached two dmesg logs to bugzilla with
> > > > > > boot parameter 'dyndbg="file pciehp* +p" as requested. One
> > > > > > with "pci=nommconf" and one without.
> > > > > >
> > > > > > Issue seems to related to "pci=nommconf" and it is probably
> > > > > > caused by change outside pciehp.  
> > > > >
> > > > > Maybe I'm missing something.  If I understand correctly, the
> > > > > problem has nothing to do with the kernel version (correct me
> > > > > if I'm wrong!)  
> > > >
> > > > The problem occurred after the merge commit. It is some kind of
> > > > regression.  
> > >
> > > The bug report doesn't yet contain the evidence showing this.  It
> > > only contains dmesg logs with "pci=nommconf" where pciehp doesn't
> > > work (which is the expected behavior) and a log without
> > > "pci=nommconf" where pciehp does work (which is again the
> > > expected behavior). 
> > > > > PCIe native hotplug doesn't work when booted with
> > > > > "pci=nommconf". When using "pci=nommconf", obviously we can't
> > > > > access the extended PCI config space (offset 0x100-0xfff), so
> > > > > none of the extended capabilities are available.
> > > > >
> > > > > In that case, we don't even ask the platform for control of
> > > > > PCIe hotplug via _OSC.  From the dmesg diff from normal
> > > > > (working) to "pci=nommconf" (not working):
> > > > >
> > > > >   -Command line: BOOT_IMAGE=/boot/vmlinuz-smp ...
> > > > >   +Command line: BOOT_IMAGE=/boot/vmlinuz-smp pci=nommconf ...
> > > > >   ...
> > > > >   -acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig ASPM
> > > > > ClockPM Segments MSI HPX-Type3] -acpi PNP0A08:00: _OSC:
> > > > > platform does not support [AER LTR] -acpi PNP0A08:00: _OSC:
> > > > > OS now controls [PCIeHotplug PME PCIeCapability] +acpi
> > > > > PNP0A08:00: _OSC: OS supports [ASPM ClockPM Segments MSI
> > > > > HPX-Type3] +acpi PNP0A08:00: _OSC: not requesting OS control;
> > > > > OS requires [ExtendedConfig ASPM ClockPM MSI] +acpi
> > > > > PNP0A08:00: MMCONFIG is disabled, can't access extended PCI
> > > > > configuration space under this bridge.  
> > > >
> > > > So, it shouldn't work from years but it has been broken
> > > > recently, that is the only objection I have. Could you tell why
> > > > it was working? According to your words- it shouldn't. We are
> > > > using VMD driver, is that matter?  
> > >
> > > 04b12ef163d1 ("PCI: vmd: Honor ACPI _OSC on PCIe features") looks
> > > like a it could be related.  Try reverting that commit and see
> > > whether it makes a difference.  
> > 
> > The affected NVMe is indeed behind VMD domain, so I think the commit
> > can make a difference.
> > 
> > Does VMD behave differently on laptops and servers?
> > Anyway, I agree that the issue really lies in "pci=nommconf".  
> 
> Oh, I have a guess:
> 
>   - With "pci=nommconf", prior to v5.17-rc1, pciehp did not work in
>     general, but *did* work for NVMe behind a VMD.  As of v5.17-rc1,
>     pciehp no longer works for NVMe behind VMD.
> 
>   - Without "pci=nommconf", pciehp works as expected for all devices
>     including NVMe behind VMD, both before and after v5.17-rc1.
> 
> Is that what you're observing?
> 
> If so, I doubt there's anything to fix other than getting rid of
> "pci=nommconf".
> 
> Bjorn

I haven't tested with VMD disabled earlier. I verified it and my
observations are as follows:

OS: RHEL 8.4
NO - hotplug not working
YES - hotplug working

pci=nommconf added:
+--------------+-------------------+---------------------+--------------+
|              | pci-v5.17-changes | revert-04b12ef163d1 | inbox kernel
+--------------+-------------------+---------------------+--------------+
| VMD enabled  | NO                | YES                 | YES         
+--------------+-------------------+---------------------+--------------+
| VMD disabled | NO                | NO                  | NO
+--------------+-------------------+---------------------+--------------+

without pci=nommconf:
+--------------+-------------------+---------------------+--------------+
|              | pci-v5.17-changes | revert-04b12ef163d1 | inbox kernel
+--------------+-------------------+---------------------+--------------+
| VMD enabled  | YES               | YES                 | YES
+--------------+-------------------+---------------------+--------------+
| VMD disabled | YES               | YES                 | YES
+--------------+-------------------+---------------------+--------------+

So, results confirmed your assumptions, but I also confirmed that
revert of 04b12ef163d1 ("PCI: vmd: Honor ACPI _OSC on PCIe features")
makes it to work as in inbox kernel.

We will drop the legacy parameter in our tests. According to my results
there is a regression in VMD caused by: 04b12ef163d1 commit, even if it
is not working for nvme anyway. Should it be fixed?

Thanks,
Blazej



[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux