On 4/25/2024 10:24 AM, Bjorn Helgaas wrote:
On Wed, Apr 24, 2024 at 02:29:16PM -0700, Paul M Stillwell Jr wrote:
On 4/23/2024 5:47 PM, Bjorn Helgaas wrote:
On Tue, Apr 23, 2024 at 04:10:37PM -0700, Paul M Stillwell Jr wrote:
On 4/23/2024 2:26 PM, Bjorn Helgaas wrote:
On Mon, Apr 22, 2024 at 04:39:19PM -0700, Paul M Stillwell Jr wrote:
On 4/22/2024 3:52 PM, Bjorn Helgaas wrote:
On Mon, Apr 22, 2024 at 02:39:16PM -0700, Paul M Stillwell Jr wrote:
On 4/22/2024 1:27 PM, Bjorn Helgaas wrote:
...
_OSC negotiates ownership of features between platform firmware and
OSPM. The "native_pcie_hotplug" and similar bits mean that "IF a
device advertises the feature, the OS can use it." We clear those
native_* bits if the platform retains ownership via _OSC.
If BIOS doesn't enable the VMD host bridge and doesn't supply _OSC for
the domain below it, why would we assume that BIOS retains ownership
of the features negotiated by _OSC? I think we have to assume the OS
owns them, which is what happened before 04b12ef163d1.
Sorry, this confuses me :) If BIOS doesn't enable VMD (i.e. VMD is disabled)
then all the root ports and devices underneath VMD are visible to the BIOS
and OS so ACPI would run on all of them and the _OSC bits should be set
correctly.
Sorry, that was confusing. I think there are two pieces to enabling
VMD:
1) There's the BIOS "VMD enable" switch. If set, the VMD device
appears as an RCiEP and the devices behind it are invisible to the
BIOS. If cleared, VMD doesn't exist; the VMD RCiEP is hidden and
the devices behind it appear as normal Root Ports with devices below
them.
2) When the BIOS "VMD enable" is set, the OS vmd driver configures
the VMD RCiEP and enumerates things below the VMD host bridge.
In this case, BIOS enables the VMD RCiEP, but it doesn't have a
driver for it and it doesn't know how to enumerate the VMD Root
Ports, so I don't think it makes sense for BIOS to own features for
devices it doesn't know about.
That makes sense to me. It sounds like VMD should own all the features, I
just don't know how the vmd driver would set the bits other than hotplug
correctly... We know leaving them on is problematic, but I'm not sure what
method to use to decide which of the other bits should be set or not.
My starting assumption would be that we'd handle the VMD domain the
same as other PCI domains: if a device advertises a feature, the
kernel includes support for it, and the kernel owns it, we enable it.
I've been poking around and it seems like some things (I was looking for
AER) are global to the platform. In my investigation (which is a small
sample size of machines) it looks like there is a single entry in the BIOS
to enable/disable AER so whatever is in one domain should be the same in all
the domains. I couldn't find settings for LTR or the other bits, but I'm not
sure what to look for in the BIOS for those.
So it seems that there are 2 categories: platform global and device
specific. AER and probably some of the others are global and can be copied
from one domain to another, but things like hotplug are device specific and
should be handled that way.
_OSC is the only mechanism for negotiating ownership of these
features, and PCI Firmware r3.3, sec 4.5.1, is pretty clear that _OSC
only applies to the hierarchy originated by the PNP0A03/PNP0A08 host
bridge that contains the _OSC method. AFAICT, there's no
global/device-specific thing here.
The BIOS may have a single user-visible setting, and it may apply that
setting to all host bridge _OSC methods, but that's just part of the
BIOS UI, not part of the firmware/OS interface.
Fair, but we are still left with the question of how to set the _OSC bits
for the VMD bridge. This would normally happen using ACPI AFAICT and we
don't have that for the devices behind VMD.
In the absence of a mechanism for negotiating ownership, e.g., an ACPI
host bridge device for the hierarchy, the OS owns all the PCIe
features.
I'm new to this space so I don't know what it means for the OS to own
the features. In other words, how would the vmd driver figure out what
features are supported?
If a device advertises a feature but there's a hardware problem with
it, the usual approach is to add a quirk to work around the problem.
The Correctable Error issue addressed by 04b12ef163d1 ("PCI: vmd:
Honor ACPI _OSC on PCIe features"), looks like it might be in this
category.
I don't think we had a hardware problem with these Samsung (IIRC) devices;
the issue was that the vmd driver were incorrectly enabling AER because
those native_* bits get set automatically.
Where do all the Correctable Errors come from? IMO they're either
caused by some hardware issue or by a software error in programming
AER. It's possible we forget to clear the errors and we just see the
same error reported over and over. But I don't think the answer is
to copy the AER ownership from a different domain.
I looked back at the original bugzilla and I feel like the AER errors are a
red herring. AER was *supposed* to be disabled, but was incorrectly enabled
by VMD so we are seeing errors. Yes, they may be real errors, but my point
is that the user had disabled AER so they didn't care if there were errors
or not (i.e. if AER had been correctly disabled by VMD then the user would
not have AER errors in the dmesg output).
04b12ef163d1 basically asserted "the platform knows about a hardware
issue between VMD and this NVMe and avoided it by disabling AER in
domain 0000; therefore we should also disable AER in the VMD domain."
Your patch at
https://lore.kernel.org/linux-pci/20240408183927.135-1-paul.m.stillwell.jr@xxxxxxxxx/
says "vmd users *always* want hotplug enabled." What happens when a
platform knows about a hotplug hardware issue and avoids it by
disabling hotplug in domain 0000?
I was thinking about this also and I could look at all the root ports
underneath vmd and see if hotplug is set for any of them. If it is then
we could set the native_*hotplug bits based on that.
I think 04b12ef163d1 would avoid it in the VMD domain, but your patch
would expose the hotplug issue.
Kai-Heng even says this in one of his responses here https://lore.kernel.org/linux-pci/CAAd53p6hATV8TOcJ9Qi2rMwVi=y_9+tQu6KhDkAm6Y8=cQ_xoA@xxxxxxxxxxxxxx/.
A quote from his reply "To be more precise, AER is disabled by the platform
vendor in BIOS to paper over the issue."
I suspect there's a real hardware issue between the VMD and the
Samsung NVMe that causes these Correctable Errors. I think disabling
AER via _OSC is a bad way to work around it because:
- it disables AER for *everything* in domain 0000, when other
devices probably work perfectly fine,
- it assumes the OS vmd driver applies domain 0000 _OSC to the VMD
domain, which isn't required by any spec, and
- it disables *all* of AER, including Uncorrectable Errors, and I'd
like to know about those, even if we have to mask the Correctable
Errors.
In https://bugzilla.kernel.org/show_bug.cgi?id=215027#c5, Kai-Heng did
not see the Correctable Error flood when VMD was turned off and
concluded that the issue is VMD specific.
But I think it's likely that the errors still occur even when VMD is
turned off, and we just don't see the flood because AER is disabled.
I suggested an experiment with "pcie_ports=native", which should
enable AER even if _OSC doesn't grant ownership:
https://bugzilla.kernel.org/show_bug.cgi?id=215027#c9
I don't have a way to test his configuration so that would be something
he would need to do.
Bjorn