Re: [PATCH] PCI: vmd: VMD to control Hotplug on its rootports

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 7/13/2023 12:26 PM, Jonathan Derrick wrote:
>
>
> On 7/13/2023 10:58 AM, Bjorn Helgaas wrote:
>> [+cc Jonathan, Lorenzo, Krzysztof, Rob (from MAINTAINERS)]
>>
>> Can you make the subject line say what the patch does?  Repeating
>> "VMD" is probably unnecessary.
>>
>> On Wed, Jul 05, 2023 at 01:20:38PM -0400, Nirmal Patel wrote:
>>> The hotplug functionality is broken in various combinations of guest
>>> OSes i.e. RHEL, SlES and hypervisors i.e. KVM and ESXI.
>>
>> "SLES"
>>
>>> VMD enabled on Intel ADL cpus caused interrupt storm for smasung
>>> drives due to AER being enabled on VMD controlled root ports.
>>
>> "Samsung"
>>
>> Enabling AER should not cause an interrupt storm.  There must be
>> something else going on in addition.  Are you saying the Samsung
>> drives have some AER-related defect, like generating Correctable
>> Errors when they shouldn't?  It doesn't sound like "Intel ADL" would
>> be relevant here.
>>
>> It's not clear to me if this is directly related to *hotplug* or if
>> it's an AER issue that may happen because of hotplug and possible for
>> other reasons.
>>
>>> The patch 04b12ef163d10e348db664900ae7f611b83c7a0e
>>
>> 12 char SHA1 is sufficient.
>>
>>> ("PCI: vmd: Honor APCI _OSC on PCIe features.") was added to the VMD
>>> driver to correct the issue based on the following assumption:
>>>     “Since VMD is an aperture to regular PCIe root ports, honor ACPI
>>>     _OSC to disable PCIe features accordingly to resolve the issue.”
>>>     Link: https://lore.kernel.org/r/20211203031541.1428904-1-kai.heng.feng@xxxxxxxxxxxxx
>>>
>>> VMD as a PCIe device is an end point(type 0), not a PCIe aperture
>>> (pcie bridge). In fact VMD is a type 0 raid controller(class code).
>>> When BIOS boots, all root ports under VMD is inaccessible by BIOS, and
>>> as such, they maintain their power on default states. The VMD UEFI DXE
>>> driver loads and configure all devices under VMD. This is how AER,
>>> power management and hotplug gets enabled in UEFI, since the BIOS pci
>>> driver cannot access the root ports.
>>
>> s/pcie/PCIe/
>> s/pci/PCI/
>> s/raid/RAID/
>>
>>> The patch worked around the interrupt storm by assigning the native
>>
>> I assume "the patch" means 04b12ef163d1.  Sometimes people write "the
>> patch does X" in the commit log for the current patch, so it's nice to
>> be specific to avoid confusion.
>>
>>> ACPI states to the  root ports under VMD. It assigns AER, hotplug,
>>> PME, etc. These have been restored back to the power on default state
>>> in guest OS, which says the root port hot plug enable is default OFF.
>>> At most, the work around should have only assigned AER state.
>>> An additional patch should be added to exclude hot plug from the
>>> original patch.
>>
>> Add blank line between paragraphs.
>>
>>> This will cause hot plug to start working again in the guest, as the
>>> settings implemented by the UEFI VMD DXE driver will remain in effect
>>> in Linux.
> So hotplug will work in the guest per UEFI VMD DXE driver, but AER will be determined by the host native_aer state?

Yes. Guest doesn't have UEFI driver to configure rootports as in case of Host.

>
>>
>> This is a lot of description that doesn't seem to say what the actual
>> underlying issue is.  It's basically "if we treat hotplug differently
>> from other negotiated features, things work better."  And it seems
>> like it depends on what the UEFI driver did, and we should try to
>> avoid a dependency there.
>>
>> If the issue is too many Correctable Errors being reported via AER, we
>> have that problem regardless of VMD, and we should handle it by
>> rate-limiting and/or suppressing them completely for particularly
>> offensive devices.  We have some issues like this that are still
>> outstanding:
> I'm curious if Non-VMD root ports on the system see the same CE storm.
> Either way, rate-limiting per device seems like a good idea.

AER is not the issue here and we can leave them as it is.
I added unnecessary description which caused more confusion.
My apologies.

>
>>
>> [1] https://lore.kernel.org/linux-pci/DM6PR04MB6473197DBD89FF4643CC391F8BC19@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/
>> [2] https://lore.kernel.org/linux-pci/20230606035256.2886098-2-grundler@xxxxxxxxxxxx/
>>
>>> Signed-off-by: Nirmal Patel <nirmal.patel@xxxxxxxxxxxxxxx>
>>> ---
>>>   drivers/pci/controller/vmd.c | 2 --
>>>   1 file changed, 2 deletions(-)
>>>
>>> diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
>>> index 769eedeb8802..52c2461b4761 100644
>>> --- a/drivers/pci/controller/vmd.c
>>> +++ b/drivers/pci/controller/vmd.c
>>> @@ -701,8 +701,6 @@ static int vmd_alloc_irqs(struct vmd_dev *vmd)
>>>   static void vmd_copy_host_bridge_flags(struct pci_host_bridge *root_bridge,
>>>                          struct pci_host_bridge *vmd_bridge)
>>>   {
>>> -    vmd_bridge->native_pcie_hotplug = root_bridge->native_pcie_hotplug;
>>> -    vmd_bridge->native_shpc_hotplug = root_bridge->native_shpc_hotplug;
>>>       vmd_bridge->native_aer = root_bridge->native_aer;
>>>       vmd_bridge->native_pme = root_bridge->native_pme;
>>>       vmd_bridge->native_ltr = root_bridge->native_ltr;
>>> -- 
>>> 2.31.1
>>>




[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux