On 7/13/2023 12:26 PM, Jonathan Derrick wrote: > > > On 7/13/2023 10:58 AM, Bjorn Helgaas wrote: >> [+cc Jonathan, Lorenzo, Krzysztof, Rob (from MAINTAINERS)] >> >> Can you make the subject line say what the patch does? Repeating >> "VMD" is probably unnecessary. >> >> On Wed, Jul 05, 2023 at 01:20:38PM -0400, Nirmal Patel wrote: >>> The hotplug functionality is broken in various combinations of guest >>> OSes i.e. RHEL, SlES and hypervisors i.e. KVM and ESXI. >> >> "SLES" >> >>> VMD enabled on Intel ADL cpus caused interrupt storm for smasung >>> drives due to AER being enabled on VMD controlled root ports. >> >> "Samsung" >> >> Enabling AER should not cause an interrupt storm. There must be >> something else going on in addition. Are you saying the Samsung >> drives have some AER-related defect, like generating Correctable >> Errors when they shouldn't? It doesn't sound like "Intel ADL" would >> be relevant here. >> >> It's not clear to me if this is directly related to *hotplug* or if >> it's an AER issue that may happen because of hotplug and possible for >> other reasons. >> >>> The patch 04b12ef163d10e348db664900ae7f611b83c7a0e >> >> 12 char SHA1 is sufficient. >> >>> ("PCI: vmd: Honor APCI _OSC on PCIe features.") was added to the VMD >>> driver to correct the issue based on the following assumption: >>> “Since VMD is an aperture to regular PCIe root ports, honor ACPI >>> _OSC to disable PCIe features accordingly to resolve the issue.” >>> Link: https://lore.kernel.org/r/20211203031541.1428904-1-kai.heng.feng@xxxxxxxxxxxxx >>> >>> VMD as a PCIe device is an end point(type 0), not a PCIe aperture >>> (pcie bridge). In fact VMD is a type 0 raid controller(class code). >>> When BIOS boots, all root ports under VMD is inaccessible by BIOS, and >>> as such, they maintain their power on default states. The VMD UEFI DXE >>> driver loads and configure all devices under VMD. This is how AER, >>> power management and hotplug gets enabled in UEFI, since the BIOS pci >>> driver cannot access the root ports. >> >> s/pcie/PCIe/ >> s/pci/PCI/ >> s/raid/RAID/ >> >>> The patch worked around the interrupt storm by assigning the native >> >> I assume "the patch" means 04b12ef163d1. Sometimes people write "the >> patch does X" in the commit log for the current patch, so it's nice to >> be specific to avoid confusion. >> >>> ACPI states to the root ports under VMD. It assigns AER, hotplug, >>> PME, etc. These have been restored back to the power on default state >>> in guest OS, which says the root port hot plug enable is default OFF. >>> At most, the work around should have only assigned AER state. >>> An additional patch should be added to exclude hot plug from the >>> original patch. >> >> Add blank line between paragraphs. >> >>> This will cause hot plug to start working again in the guest, as the >>> settings implemented by the UEFI VMD DXE driver will remain in effect >>> in Linux. > So hotplug will work in the guest per UEFI VMD DXE driver, but AER will be determined by the host native_aer state? Yes. Guest doesn't have UEFI driver to configure rootports as in case of Host. > >> >> This is a lot of description that doesn't seem to say what the actual >> underlying issue is. It's basically "if we treat hotplug differently >> from other negotiated features, things work better." And it seems >> like it depends on what the UEFI driver did, and we should try to >> avoid a dependency there. >> >> If the issue is too many Correctable Errors being reported via AER, we >> have that problem regardless of VMD, and we should handle it by >> rate-limiting and/or suppressing them completely for particularly >> offensive devices. We have some issues like this that are still >> outstanding: > I'm curious if Non-VMD root ports on the system see the same CE storm. > Either way, rate-limiting per device seems like a good idea. AER is not the issue here and we can leave them as it is. I added unnecessary description which caused more confusion. My apologies. > >> >> [1] https://lore.kernel.org/linux-pci/DM6PR04MB6473197DBD89FF4643CC391F8BC19@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/ >> [2] https://lore.kernel.org/linux-pci/20230606035256.2886098-2-grundler@xxxxxxxxxxxx/ >> >>> Signed-off-by: Nirmal Patel <nirmal.patel@xxxxxxxxxxxxxxx> >>> --- >>> drivers/pci/controller/vmd.c | 2 -- >>> 1 file changed, 2 deletions(-) >>> >>> diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c >>> index 769eedeb8802..52c2461b4761 100644 >>> --- a/drivers/pci/controller/vmd.c >>> +++ b/drivers/pci/controller/vmd.c >>> @@ -701,8 +701,6 @@ static int vmd_alloc_irqs(struct vmd_dev *vmd) >>> static void vmd_copy_host_bridge_flags(struct pci_host_bridge *root_bridge, >>> struct pci_host_bridge *vmd_bridge) >>> { >>> - vmd_bridge->native_pcie_hotplug = root_bridge->native_pcie_hotplug; >>> - vmd_bridge->native_shpc_hotplug = root_bridge->native_shpc_hotplug; >>> vmd_bridge->native_aer = root_bridge->native_aer; >>> vmd_bridge->native_pme = root_bridge->native_pme; >>> vmd_bridge->native_ltr = root_bridge->native_ltr; >>> -- >>> 2.31.1 >>>