Re: [PATCH] PCI: vmd: Enable Hotplug based on BIOS setting on VMD rootports

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Nirmal,

On Wed, Nov 15, 2023 at 5:00 AM Nirmal Patel
<nirmal.patel@xxxxxxxxxxxxxxx> wrote:
>
> On Wed, 2023-11-08 at 16:49 +0200, Kai-Heng Feng wrote:
> > On Wed, Nov 8, 2023 at 12:30 AM Bjorn Helgaas <helgaas@xxxxxxxxxx>
> > wrote:
> > > [+cc Rafael, just FYI re 04b12ef163d1 ("PCI: vmd: Honor ACPI _OSC
> > > on PCIe features")]
> > >
> > > On Tue, Nov 07, 2023 at 02:50:57PM -0700, Nirmal Patel wrote:
> > > > On Thu, 2023-11-02 at 16:49 -0700, Nirmal Patel wrote:
> > > > > On Thu, 2023-11-02 at 15:41 -0500, Bjorn Helgaas wrote:
> > > > > > On Thu, Nov 02, 2023 at 01:07:03PM -0700, Nirmal Patel wrote:
> > > > > > > On Wed, 2023-11-01 at 17:20 -0500, Bjorn Helgaas wrote:
> > > > > > > > On Tue, Oct 31, 2023 at 12:59:34PM -0700, Nirmal Patel
> > > > > > > > wrote:
> > > > > > > > > On Tue, 2023-10-31 at 10:31 -0500, Bjorn Helgaas wrote:
> > > > > > > > > > On Mon, Oct 30, 2023 at 04:16:54PM -0400, Nirmal
> > > > > > > > > > Patel
> > > > > > > > > > wrote:
> > > > > > > > > > > VMD Hotplug should be enabled or disabled based on
> > > > > > > > > > > VMD
> > > > > > > > > > > rootports' Hotplug configuration in BIOS.
> > > > > > > > > > > is_hotplug_bridge
> > > > > > > > > > > is set on each VMD rootport based on Hotplug
> > > > > > > > > > > capable bit
> > > > > > > > > > > in
> > > > > > > > > > > SltCap in probe.c.  Check is_hotplug_bridge and
> > > > > > > > > > > enable or
> > > > > > > > > > > disable native_pcie_hotplug based on that value.
> > > > > > > > > > >
> > > > > > > > > > > Currently VMD driver copies ACPI settings or
> > > > > > > > > > > platform
> > > > > > > > > > > configurations for Hotplug, AER, DPC, PM, etc and
> > > > > > > > > > > enables
> > > > > > > > > > > or
> > > > > > > > > > > disables these features on VMD bridge which is not
> > > > > > > > > > > correct
> > > > > > > > > > > in case of Hotplug.
> > > > > > > > > >
> > > > > > > > > > This needs some background about why it's correct to
> > > > > > > > > > copy
> > > > > > > > > > the
> > > > > > > > > > ACPI settings in the case of AER, DPC, PM, etc, but
> > > > > > > > > > incorrect
> > > > > > > > > > for hotplug.
> > > > > > > > > >
> > > > > > > > > > > Also during the Guest boot up, ACPI settings along
> > > > > > > > > > > with
> > > > > > > > > > > VMD
> > > > > > > > > > > UEFI driver are not present in Guest BIOS which
> > > > > > > > > > > results
> > > > > > > > > > > in
> > > > > > > > > > > assigning default values to Hotplug, AER, DPC, etc.
> > > > > > > > > > > As a
> > > > > > > > > > > result Hotplug is disabled on VMD in the Guest OS.
> > > > > > > > > > >
> > > > > > > > > > > This patch will make sure that Hotplug is enabled
> > > > > > > > > > > properly
> > > > > > > > > > > in Host as well as in VM.
> > > > > > > > > >
> > > > > > > > > > Did we come to some consensus about how or whether
> > > > > > > > > > _OSC for
> > > > > > > > > > the host bridge above the VMD device should apply to
> > > > > > > > > > devices
> > > > > > > > > > in the separate domain below the VMD?
> > > > > > > > >
> > > > > > > > > We are not able to come to any consensus. Someone
> > > > > > > > > suggested
> > > > > > > > > to
> > > > > > > > > copy either all _OSC flags or none. But logic behind
> > > > > > > > > that
> > > > > > > > > assumption is that the VMD is a bridge device which is
> > > > > > > > > not
> > > > > > > > > completely true. VMD is an endpoint device and it owns
> > > > > > > > > its
> > > > > > > > > domain.
> > > > > > > >
> > > > > > > > Do you want to facilitate a discussion in the PCI
> > > > > > > > firmware SIG
> > > > > > > > about this?  It seems like we may want a little text in
> > > > > > > > the
> > > > > > > > spec
> > > > > > > > about how to handle this situation so platforms and OSes
> > > > > > > > have
> > > > > > > > the
> > > > > > > > same expectations.
> > > > > > >
> > > > > > > The patch 04b12ef163d1 broke intel VMD's hotplug
> > > > > > > capabilities and
> > > > > > > author did not test in VM environment impact.
> > > > > > > We can resolve the issue easily by
> > > > > > >
> > > > > > > #1 Revert the patch which means restoring VMD's original
> > > > > > > functionality
> > > > > > > and author provide better fix.
> > > > > > >
> > > > > > > or
> > > > > > >
> > > > > > > #2 Allow the current change to re-enable VMD hotplug inside
> > > > > > > VMD
> > > > > > > driver.
> > > > > > >
> > > > > > > There is a significant impact for our customers hotplug use
> > > > > > > cases
> > > > > > > which
> > > > > > > forces us to apply the fix in out-of-box drivers for
> > > > > > > different
> > > > > > > OSs.
> > > > > >
> > > > > > I agree 100% that there's a serious problem here and we need
> > > > > > to fix
> > > > > > it, there's no argument there.
> > > > > >
> > > > > > I guess you're saying it's obvious that an _OSC above VMD
> > > > > > does not
> > > > > > apply to devices below VMD, and therefore, no PCI Firmware
> > > > > > SIG
> > > > > > discussion or spec clarification is needed?
> > > > >
> > > > > Yes. By design VMD is an endpoint device to OS and its domain
> > > > > is
> > > > > privately owned by VMD only. I believe we should revert back to
> > > > > original design and not impose _OSC settings on VMD domain
> > > > > which is
> > > > > also a maintainable solution.
> > > >
> > > > I will send out revert patch. The _OSC settings shouldn't apply
> > > > to private VMD domain.
> > >
> > > I assume you mean to revert 04b12ef163d1 ("PCI: vmd: Honor ACPI
> > > _OSC
> > > on PCIe features").  That appeared in v5.17, and it fixed (or at
> > > least
> > > prevented) an AER message flood.  We can't simply revert
> > > 04b12ef163d1
> > > unless we first prevent that AER message flood in another way.
> >
> > The error is "correctable".
> > Does masking all correctable AER error by default make any sense? And
> > add a sysfs knob to make it optional.
> I assume sysfs knob requires driver reload. right? Can you send a
> patch?

What I mean is to mask Correctable Errors by default on *all*
rootports, and create a new sysfs knob to let user decide if
Correctable Errors should be unmasked.

I can send a patch, but of course I'd like to hear what Bjorn thinks
about this approach.

Kai-Heng

>
> nirmal
> >
> > Kai-Heng
> >
> > > Bjorn
> > >
> > > > Even the patch 04b12ef163d1 needs more changes to make sure _OSC
> > > > settings are passed on from Host BIOS to Guest BIOS which means
> > > > involvement of ESXi, Windows HyperV, KVM.
>





[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux