On 6/21/2024 2:16 PM, Bjorn Helgaas wrote:
On Thu, Jun 20, 2024 at 03:41:33PM -0700, Nirmal Patel wrote:
On Thu, 2 May 2024 17:56:08 -0500
Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:
On Thu, May 02, 2024 at 03:38:00PM -0700, Paul M Stillwell Jr wrote:
On 5/2/2024 3:08 PM, Bjorn Helgaas wrote:
On Mon, Apr 08, 2024 at 11:39:27AM -0700, Paul M Stillwell Jr
wrote:
Commit 04b12ef163d1 ("PCI: vmd: Honor ACPI _OSC on PCIe
features") added code to copy the _OSC flags from the root
bridge to the host bridge for each vmd device because the AER
bits were not being set correctly which was causing an AER
interrupt storm for certain NVMe devices.
This works fine in bare metal environments, but causes problems
when the vmd driver is run in a hypervisor environment. In a
hypervisor all the _OSC bits are 0 despite what the underlying
hardware indicates. This is a problem for vmd users because if
vmd is enabled the user *always* wants hotplug support enabled.
To solve this issue the vmd driver always enables the hotplug
bits in the host bridge structure for each vmd.
Fixes: 04b12ef163d1 ("PCI: vmd: Honor ACPI _OSC on PCIe
features") Signed-off-by: Nirmal Patel
<nirmal.patel@xxxxxxxxxxxxxxx> Signed-off-by: Paul M Stillwell
Jr <paul.m.stillwell.jr@xxxxxxxxx> ---
drivers/pci/controller/vmd.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/drivers/pci/controller/vmd.c
b/drivers/pci/controller/vmd.c index 87b7856f375a..583b10bd5eb7
100644 --- a/drivers/pci/controller/vmd.c
+++ b/drivers/pci/controller/vmd.c
@@ -730,8 +730,14 @@ static int vmd_alloc_irqs(struct vmd_dev
*vmd) static void vmd_copy_host_bridge_flags(struct
pci_host_bridge *root_bridge, struct pci_host_bridge
*vmd_bridge) {
- vmd_bridge->native_pcie_hotplug =
root_bridge->native_pcie_hotplug;
- vmd_bridge->native_shpc_hotplug =
root_bridge->native_shpc_hotplug;
+ /*
+ * there is an issue when the vmd driver is running
within a hypervisor
+ * because all of the _OSC bits are 0 in that case.
this disables
+ * hotplug support, but users who enable VMD in their
BIOS always want
+ * hotplug suuport so always enable it.
+ */
+ vmd_bridge->native_pcie_hotplug = 1;
+ vmd_bridge->native_shpc_hotplug = 1;
Deferred for now because I think we need to figure out how to set
all these bits the same, or at least with a better algorithm than
"here's what we want in this environment."
Extended discussion about this at
https://lore.kernel.org/r/20240417201542.102-1-paul.m.stillwell.jr@xxxxxxxxx
That's ok by me. I thought where we left it was that if we could
find a solution to the Correctable Errors from the original issue
that maybe we could revert 04b12ef163d1.
I'm not sure I would know if a patch that fixes the Correctable
Errors comes in... We have a test case we would like to test
against that was pre 04b12ef163d1 (BIOS has AER disabled and we
hotplug a disk which results in AER interrupts) so we would be
curious if the issues we saw before goes away with a new patch for
Correctable Errors.
My current theory is that there's some issue with that particular
Samsung NVMe device that causes the Correctable Error flood. Kai-Heng
says they happen even with VMD disabled.
And there are other reports that don't seem to involve VMD but do
involve this NVMe device ([144d:a80a]):
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1852420
https://forums.unraid.net/topic/113521-constant-errors-on-logs-after-nvme-upgrade/
https://forums.unraid.net/topic/118286-nvme-drives-throwing-errors-filling-logs-instantly-how-to-resolve/
https://forum.proxmox.com/threads/pve-kernel-panics-on-reboots.144481/
https://www.eevblog.com/forum/general-computing/linux-mint-21-02-clone-replace-1tb-nvme-with-a-2tb-nvme/
NVMe has weird power management stuff, so it's always possible we're
doing something wrong in a driver.
But I think we really need to handle Correctable Errors better by:
- Possibly having drivers mask errors if they know about defects
- Making the log messages less alarming, e.g., a single line report
- Rate-limiting them so they're never overwhelming
- Maybe automatically masking them in the PCI core to avoid
interrupts
vmd_bridge->native_aer = root_bridge->native_aer;
vmd_bridge->native_pme = root_bridge->native_pme;
vmd_bridge->native_ltr = root_bridge->native_ltr;
Hi Bjorn,
Do we still expect to get this patch accepted?
Based on the previous comments, even with AER fixed we will still have
an issue in Guest OS of disabling all the features which will require
making adjustments and/or removing 04b12ef163d1.
Is it possible to accept this patch and add necessary changes
when AER fix is available?
I think 04b12ef163d1 is wrong, and we shouldn't copy any of the bits
from the root port.
If vmd takes ownership of all those features, you can decide what to
do with AER, and you can disable it if you want to.
Bjorn
IIRC we are taking ownership of the flags by (eventually) calling
pci_init_host_bridge() which sets the native_* bits, right?
One way we could force a re-look at the original issue that 04b12ef163d1
fixed would be to revert it and break those use cases. I'm not sure I'm
in favor of that, but right now we have a situation where either VMD is
broken in some cases or we revert 04b12ef163d1 and break those cases.
Either way someone is not happy.
Paul