On Wed, Jun 12, 2024 at 02:52:38PM -0700, Paul M Stillwell Jr wrote: > On 4/26/2024 2:46 PM, Paul M Stillwell Jr wrote: > > On 4/26/2024 2:36 PM, Bjorn Helgaas wrote: > > > On Thu, Apr 25, 2024 at 04:32:21PM -0700, Paul M Stillwell Jr wrote: > > > > On 4/25/2024 3:32 PM, Bjorn Helgaas wrote: > > > > > On Thu, Apr 25, 2024 at 02:43:07PM -0700, Paul M Stillwell Jr wrote: > > > > > > On 4/25/2024 10:24 AM, Bjorn Helgaas wrote: > > > > > > > On Wed, Apr 24, 2024 at 02:29:16PM -0700, Paul M Stillwell Jr wrote: > > > > > > > > On 4/23/2024 5:47 PM, Bjorn Helgaas wrote: > > > > > > ... > > > > > > > > > > > > > > _OSC is the only mechanism for negotiating ownership of these > > > > > > > > > features, and PCI Firmware r3.3, sec 4.5.1, > > > > > > > > > is pretty clear that _OSC > > > > > > > > > only applies to the hierarchy originated by > > > > > > > > > the PNP0A03/PNP0A08 host > > > > > > > > > bridge that contains the _OSC method. AFAICT, there's no > > > > > > > > > global/device-specific thing here. > > > > > > > > > > > > > > > > > > The BIOS may have a single user-visible > > > > > > > > > setting, and it may apply that > > > > > > > > > setting to all host bridge _OSC methods, but > > > > > > > > > that's just part of the > > > > > > > > > BIOS UI, not part of the firmware/OS interface. > > > > > > > > > > > > > > > > Fair, but we are still left with the question of > > > > > > > > how to set the _OSC bits > > > > > > > > for the VMD bridge. This would normally happen > > > > > > > > using ACPI AFAICT and we > > > > > > > > don't have that for the devices behind VMD. > > > > > > > > > > > > > > In the absence of a mechanism for negotiating > > > > > > > ownership, e.g., an ACPI > > > > > > > host bridge device for the hierarchy, the OS owns all the PCIe > > > > > > > features. > > > > > > > > > > > > I'm new to this space so I don't know what it means for the OS to > > > > > > own the features. In other words, how would the vmd driver figure > > > > > > out what features are supported? > > > > > > > > > > There are three things that go into this: > > > > > > > > > > - Does the OS support the feature, e.g., is CONFIG_PCIEAER enabled? > > > > > > > > > > - Has the platform granted permission to the OS to use the feature, > > > > > either explicitly via _OSC or implicitly because there's no > > > > > mechanism to negotiate ownership? > > > > > > > > > > - Does the device advertise the feature, e.g., does it have an AER > > > > > Capability? > > > > > > > > > > If all three are true, Linux enables the feature. > > > > > > > > > > I think vmd has implicit ownership of all features because there is no > > > > > ACPI host bridge device for the VMD domain, and (IMO) that means there > > > > > is no way to negotiate ownership in that domain. > > > > > > > > > > So the VMD domain starts with all the native_* bits set, meaning Linux > > > > > owns the features. If the vmd driver doesn't want some feature to be > > > > > used, it could clear the native_* bit for it. > > > > > > > > > > I don't think vmd should unilaterally claim ownership of features by > > > > > *setting* native_* bits because that will lead to conflicts with > > > > > firmware. > > > > > > > > This is the crux of the problem IMO. I'm happy to set the native_* bits > > > > using some knowledge about what the firmware wants, but we don't have a > > > > mechanism to do it AFAICT. I think that's what commit > > > > 04b12ef163d1 ("PCI: > > > > vmd: Honor ACPI _OSC on PCIe features") was trying to do: use a > > > > domain that > > > > ACPI had run on and negotiated features and apply them to the > > > > vmd domain. > > > > > > Yes, this is the problem. We have no information about what firmware > > > wants for the VMD domain because there is no ACPI host bridge device. > > > > > > We have information about what firmware wants for *other* domains. > > > 04b12ef163d1 assumes that also applies to the VMD domain, but I don't > > > think that's a good idea. > > > > > > > Using the 3 criteria you described above, could we do this for > > > > the hotplug > > > > feature for VMD: > > > > > > > > 1. Use IS_ENABLED(CONFIG_<whatever hotplug setting we need>) to > > > > check to see > > > > if the hotplug feature is enabled > > > > > > That's already there. > > > > > > > 2. We know that for VMD we want hotplug enabled so that is the implicit > > > > permission > > > > > > The VMD domain starts with all native_* bits set. All you have to do > > > is avoid *clearing* them. > > > > > > The problem (IMO) is that 04b12ef163d1 clears bits based on the _OSC > > > for some other domain. > > > > > > > 3. Look at the root ports below VMD and see if hotplug capability is set > > > > > > This is already there, too. > > > > > > > If 1 & 3 are true, then we set the native_* bits for hotplug (we > > > > can look > > > > for surprise hotplug as well in the capability to set the > > > > native_shpc_hotplug bit corrrectly) to 1. This feels like it > > > > would solve the > > > > problem of "what if there is a hotplug issue on the platform" > > > > because the > > > > user would have disabled hotplug for VMD and the root ports > > > > below it would > > > > have the capability turned off. > > > > > > > > In theory we could do this same thing for all the features, but we don't > > > > know what the firmware wants for features other than hotplug (because we > > > > implicitly know that vmd wants hotplug). I feel like > > > > 04b12ef163d1 is a good > > > > compromise for the other features, but I hear your issues with it. > > > > > > > > I'm happy to "do the right thing" for the other features, I just can't > > > > figure out what that thing is :) > > > > > > 04b12ef163d1 was motivated by a flood of Correctable Errors. > > > > > > Kai-Heng says the errors occur even when booting with > > > "pcie_ports=native" and VMD turned off, i.e., when the VMD RCiEP is > > > disabled and the NVMe devices appear under plain Root Ports in domain > > > 0000. That suggests that they aren't related to VMD at all. > > > > > > I think there's a significant chance that those errors are caused by a > > > software defect, e.g., ASPM configuration. There are many similar > > > reports of Correctable Errors where "pcie_aspm=off" is a workaround. > > > > > > If we can nail down the root cause of those Correctable Errors, we may > > > be able to fix it and just revert 04b12ef163d1. That would leave all > > > the PCIe features owned and enabled by Linux in the VMD domain. AER > > > would be enabled and not a problem, hotplug would be enabled as you > > > need, etc. > > > > > > There are a zillion reports of these errors and I added comments to > > > some to see if anybody can help us get to a root cause. > > > > OK, sounds like the plan for me is to sit tight for now WRT a patch to > > fix hotplug. I'll submit a v2 for the documentation. > > Any update on fixes for the errors? Ah, sorry, I need to get back to this. I don't think it's ASPM-related; I think it's just that disabling ASPM also causes AER to be disabled, so the errors probably still occur but we ignore them. Bjorn