Re: [PATCH v2] PCI: vmd: Enable Hotplug based on BIOS setting on VMD rootports

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Dec 05, 2023 at 03:20:27PM -0700, Nirmal Patel wrote:
> On Fri, 2023-12-01 at 18:07 -0600, Bjorn Helgaas wrote:
> > On Mon, Nov 27, 2023 at 04:17:29PM -0500, Nirmal Patel wrote:
> > > Currently VMD copies root bridge setting to enable Hotplug on its
> > > rootports. This mechanism works fine for Host OS and no issue has
> > > been observed. However in case of VM, all the HyperVisors don't
> > > pass the Hotplug setting to the guest BIOS which results in
> > > assigning default values and disabling Hotplug capability in the
> > > guest which have been observed by many OEMs.
> > 
> > Can we make this a little more specific?  I'm lacking a lot of
> > virtualization background here, so I'm going to ask a bunch of stupid
> > questions here:
> > 
> > "VMD copies root bridge setting" refers to _OSC and the copying done
> > in vmd_copy_host_bridge_flags()?  Obviously this must be in the Host
> > OS.
>
> Yes. we are talking about vmd_copy_host_bridge_flags. It gets called in
> both Host and Guest. But it assigns different values. In Host, it
> assigns correct values, while in guest it assigns power-on default
> value 0 which in simple term disable everything including hotplug.

If vmd_copy_host_bridge_flags() is called both in the Host and the
Guest, I guess that means both the VMD endpoint and the VMD Root Ports
below it are passed through to the Guest?  I expected only the VMD
Root Ports would be passed through, but maybe that's my
misunderstanding.

The VMD endpoint is under host bridge A, and the VMD creates a new PCI
domain under a new host bridge B.  vmd_copy_host_bridge_flags() copies
all the feature ownership flags from A to B.

On ACPI systems (like both Host and Guest) all these flags are
initially cleared for host bridge A in acpi_pci_root_create() because
these features are owned by the platform until _OSC grants ownership
to the OS.  They are initially *set* for host bridge B in
pci_init_host_bridge() because it's created by the vmd driver instead
of being enumerated via ACPI.

If the Host _OSC grants ownership to the OS, A's native_pcie_hotplug
will be set, and vmd_copy_host_bridge_flags() leaves it set for B as
well.  If the Host _OSC does *not* grant hotplug ownership to the OS,
native_pcie_hotplug will be cleared for both A and B.

Apparently the Guest doesn't supply _OSC (seems like a spec violation;
could tell from the dmesg), so A's native_pcie_hotplug remains
cleared, which means vmd_copy_host_bridge_flags() will also clear it
for B.

[The lack of a Guest BIOS _OSC would also mean that if you passed
through a normal non-VMD PCIe Switch Downstream Port to the Guest, the
Guest OS would never be able to manage hotplug below that Port.  Maybe
nobody passes through Switch Ports.]

> > "This mechanism works fine for Host OS and no issue has been
> > observed."  I guess this means that if the platform grants hotplug
> > control to the Host OS via _OSC, pciehp in the Host OS works fine to
> > manage hotplug on Root Ports below the VMD?
>
> Yes. When platform hotplug setting is enabled, then _OSC assigns
> correct value to vmd root ports via vmd_copy_host_bridge_flags.
>
> > If the platform does *not* grant hotplug control to the Host OS, I
> > guess it means the Host OS pciehp can *not* manage hotplug below VMD?
> > Is that also what you expect?
> > 
> > "However in case of VM, all the HyperVisors don't pass the Hotplug
> > setting to the guest BIOS"  What is the mechanism by which they would
> > pass the hotplug setting?  I guess the guest probably doesn't see the
> > VMD endpoint itself, right?  The guest sees either virtualized or
> > passed-through VMD Root Ports?
>
> In guest, vmd is passthrough including pci topology behind vmd. The way
> we verify Hotplug capability is to read Slot Capabilities registers'
> hotplug enabled bit set on vmd root ports in Guest.

I guess maybe this refers to set_pcie_hotplug_bridge(), which checks
PCI_EXP_SLTCAP_HPC?  If it's not set, pciehp won't bind to the device.
This behavior is the same in Host and Guest.

> The values copied in vmd_copy_host_bridge_flags are different in Host
> than in Guest. I do not know what component is responsible for this in
> every HyperVisor. But I tested this in ESXi, HyperV, KVM and they all
> had the same behavior where the _OSC flags are set to power-on default
> values of 0 by vmd_copy_host_bridge_flags in Guest OS which is
> disabling hotplug.
>
> > I assume the guest OS sees a constructed ACPI system (different from
> > the ACPI environment the host sees), so it sees a PNP0A03 host bridge
> > with _OSC?  And presumably VMD Root Ports below that host bridge?
> > 
> > So are you saying the _OSC seen by the guest doesn't grant hotplug
> > control to the guest OS?  And it doesn't do that because the guest
> > BIOS hasn't implemented _OSC?  Or maybe the guest BIOS relies on the
> > Slot Capabilities registers of VMD Root Ports to decide whether _OSC
> > should grant hotplug control?  And those Slot Capabilities don't
> > advertise hotplug support?
>
> Currently Hotplug is controlled by _OSC in both host and Guest. In case
> of guest, it seems guest BIOS hasn't implemented _OSC since all the
> flags are set to 0 which is not the case in host.

I think you want pciehp to work on the VMD Root Ports in the Guest,
no matter what, on the assumption that any _OSC that applies to host
bridge A does not apply to the host bridge created by the vmd driver.

If so, we should just revert 04b12ef163d1 ("PCI: vmd: Honor ACPI _OSC
on PCIe features").  Since host bridge B was not enumerated via ACPI,
the OS owns all those features under B by default, so reverting
04b12ef163d1 would leave that state.

Obviously we'd have to first figure out another solution for the AER
message flood that 04b12ef163d1 resolved.

> VMD is passthrough in Guest so the slot capabilities registers are
> still same as what Host driver would see. That is how we can still
> control hotplug on vmd on both Guest and Host.
> > 
> > "... which results in assigning default values and disabling Hotplug
> > capability in the guest."  What default values are these?
>
> 0. Everything is disabled in guest. So basically _OSC is writing wrong
> values in guest.
>
> > Is this talking about the default host_bridge->native_pcie_hotplug
> > value set in acpi_pci_root_create(), where the OS assumes hotplug
> > is owned by the platform unless _OSC grants control to the OS?
>
> vmd driver calls pci_create_root_bus which eventually ends up
> calling pci_init_host_bridge where native_pcie_hotplug is set to 1.
> Now vmd_copy_host_bridge_flags overwrites the native_pcie_hotplug to
> _OSC setting of 0 in guest.
>
> > > VMD Hotplug can be enabled or disabled based on the VMD
> > > rootports' Hotplug configuration in BIOS. is_hotplug_bridge is
> > > set on each VMD rootport based on Hotplug capable bit in SltCap
> > > in probe.c.  Check is_hotplug_bridge and enable or disable
> > > native_pcie_hotplug based on that value.
> > > 
> > > This patch will make sure that Hotplug is enabled properly in
> > > Host as well as in VM while honoring _OSC settings as well as
> > > VMD hotplug setting.
> > 
> > "Hotplug is enabled properly in Host as well as in VM" sounds like
> > we're talking about both the host OS and the guest OS.
>
> Yes. VM means guest.
> > 
> > In the host OS, this overrides whatever was negotiated via _OSC, I
> > guess on the principle that _OSC doesn't apply because the host BIOS
> > doesn't know about the Root Ports below the VMD.  (I'm not sure it's
> > 100% resolved that _OSC doesn't apply to them, so we should mention
> > that assumption here.)
>
> _OSC still controls every flag including Hotplug. I have observed
> that slot capabilities register has hotplug enabled only when
> platform has enabled the hotplug. So technically not overriding it
> in the host.
>
> It overrides in guest because _OSC is passing wrong/different
> information than the _OSC information in Host.  Also like I
> mentioned, slot capabilities registers are not changed in Guest
> because vmd is passthrough.
> > 
> > In the guest OS, this still overrides whatever was negotiated via
> > _OSC, but presumably the guest BIOS *does* know about the Root
> > Ports, so I don't think the same principle about overriding _OSC
> > applies there.
> > 
> > I think it's still a problem that we handle
> > host_bridge->native_pcie_hotplug differently than all the other
> > features negotiated via _OSC.  We should at least explain this
> > somehow.
>
> Right now this is the only way to know in Guest OS if platform has
> enabled Hotplug or not. We have many customers complaining about
> Hotplug being broken in Guest. So just trying to honor _OSC but also
> fixing its limitation in Guest.
> > 
> > I suspect part of the reason for doing it differently is to avoid
> > the interrupt storm that we avoid via 04b12ef163d1 ("PCI: vmd:
> > Honor ACPI _OSC on PCIe features").  If so, I think 04b12ef163d1
> > should be mentioned here because it's an important piece of the
> > picture.
>
> I can add a note about this patch as well. Let me know if you want
> me to add something specific.
> 
> Thank you for taking the time. This is a very critical issue for us
> and we have many customers complaining about it, I would appreciate
> to get it resolved as soon as possible.

> > > Signed-off-by: Nirmal Patel <nirmal.patel@xxxxxxxxxxxxxxx>
> > > ---
> > > v1->v2: Updating commit message.
> > > ---
> > >  drivers/pci/controller/vmd.c | 11 ++++++++++-
> > >  1 file changed, 10 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/pci/controller/vmd.c
> > > b/drivers/pci/controller/vmd.c
> > > index 769eedeb8802..e39eaef5549a 100644
> > > --- a/drivers/pci/controller/vmd.c
> > > +++ b/drivers/pci/controller/vmd.c
> > > @@ -720,6 +720,7 @@ static int vmd_enable_domain(struct vmd_dev
> > > *vmd, unsigned long features)
> > >  	resource_size_t membar2_offset = 0x2000;
> > >  	struct pci_bus *child;
> > >  	struct pci_dev *dev;
> > > +	struct pci_host_bridge *vmd_bridge;
> > >  	int ret;
> > >  
> > >  	/*
> > > @@ -886,8 +887,16 @@ static int vmd_enable_domain(struct vmd_dev
> > > *vmd, unsigned long features)
> > >  	 * and will fail pcie_bus_configure_settings() early. It can
> > > instead be
> > >  	 * run on each of the real root ports.
> > >  	 */
> > > -	list_for_each_entry(child, &vmd->bus->children, node)
> > > +	vmd_bridge = to_pci_host_bridge(vmd->bus->bridge);
> > > +	list_for_each_entry(child, &vmd->bus->children, node) {
> > >  		pcie_bus_configure_settings(child);
> > > +		/*
> > > +		 * When Hotplug is enabled on vmd root-port, enable it
> > > on vmd
> > > +		 * bridge.
> > > +		 */
> > > +		if (child->self->is_hotplug_bridge)
> > > +			vmd_bridge->native_pcie_hotplug = 1;
> > 
> > "is_hotplug_bridge" basically means PCI_EXP_SLTCAP_HPC is set, which
> > means "the hardware of this slot is hot-plug *capable*."
> > 
> > Whether hotplug is *enabled* is a separate policy decision about
> > whether the OS is allowed to operate the hotplug functionality, so I
> > think saying "When Hotplug is enabled" in the comment is a little bit
> > misleading.
>
> I will rewrite the comment.
> > 
> > Bjorn
> > 
> > > +	}
> > >  
> > >  	pci_bus_add_devices(vmd->bus);
> > >  
> > > -- 
> > > 2.31.1
> > > 
> 




[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux