On 5/2/2024 3:08 PM, Bjorn Helgaas wrote:
On Mon, Apr 08, 2024 at 11:39:27AM -0700, Paul M Stillwell Jr wrote:
Commit 04b12ef163d1 ("PCI: vmd: Honor ACPI _OSC on PCIe features") added
code to copy the _OSC flags from the root bridge to the host bridge for each
vmd device because the AER bits were not being set correctly which was
causing an AER interrupt storm for certain NVMe devices.
This works fine in bare metal environments, but causes problems when the
vmd driver is run in a hypervisor environment. In a hypervisor all the
_OSC bits are 0 despite what the underlying hardware indicates. This is
a problem for vmd users because if vmd is enabled the user *always*
wants hotplug support enabled. To solve this issue the vmd driver always
enables the hotplug bits in the host bridge structure for each vmd.
Fixes: 04b12ef163d1 ("PCI: vmd: Honor ACPI _OSC on PCIe features")
Signed-off-by: Nirmal Patel <nirmal.patel@xxxxxxxxxxxxxxx>
Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@xxxxxxxxx>
---
drivers/pci/controller/vmd.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
index 87b7856f375a..583b10bd5eb7 100644
--- a/drivers/pci/controller/vmd.c
+++ b/drivers/pci/controller/vmd.c
@@ -730,8 +730,14 @@ static int vmd_alloc_irqs(struct vmd_dev *vmd)
static void vmd_copy_host_bridge_flags(struct pci_host_bridge *root_bridge,
struct pci_host_bridge *vmd_bridge)
{
- vmd_bridge->native_pcie_hotplug = root_bridge->native_pcie_hotplug;
- vmd_bridge->native_shpc_hotplug = root_bridge->native_shpc_hotplug;
+ /*
+ * there is an issue when the vmd driver is running within a hypervisor
+ * because all of the _OSC bits are 0 in that case. this disables
+ * hotplug support, but users who enable VMD in their BIOS always want
+ * hotplug suuport so always enable it.
+ */
+ vmd_bridge->native_pcie_hotplug = 1;
+ vmd_bridge->native_shpc_hotplug = 1;
Deferred for now because I think we need to figure out how to set all
these bits the same, or at least with a better algorithm than "here's
what we want in this environment."
Extended discussion about this at
https://lore.kernel.org/r/20240417201542.102-1-paul.m.stillwell.jr@xxxxxxxxx
That's ok by me. I thought where we left it was that if we could find a
solution to the Correctable Errors from the original issue that maybe we
could revert 04b12ef163d1.
I'm not sure I would know if a patch that fixes the Correctable Errors
comes in... We have a test case we would like to test against that was
pre 04b12ef163d1 (BIOS has AER disabled and we hotplug a disk which
results in AER interrupts) so we would be curious if the issues we saw
before goes away with a new patch for Correctable Errors.
Paul
vmd_bridge->native_aer = root_bridge->native_aer;
vmd_bridge->native_pme = root_bridge->native_pme;
vmd_bridge->native_ltr = root_bridge->native_ltr;
--
2.39.1