On 4/22/2024 1:27 PM, Bjorn Helgaas wrote:
On Fri, Apr 19, 2024 at 03:18:19PM -0700, Paul M Stillwell Jr wrote:
On 4/19/2024 2:14 PM, Bjorn Helgaas wrote:
On Thu, Apr 18, 2024 at 02:51:19PM -0700, Paul M Stillwell Jr wrote:
On 4/18/2024 11:26 AM, Bjorn Helgaas wrote:
On Wed, Apr 17, 2024 at 01:15:42PM -0700, Paul M Stillwell Jr wrote:
Adding documentation for the Intel VMD driver and updating the index
file to include it.
- Which devices are passed through to a virtual guest and enumerated
there?
All devices under VMD are passed to a virtual guest
So the guest will see the VMD Root Ports, but not the VMD RCiEP
itself?
The guest will see the VMD device and then the vmd driver in the guest will
enumerate the devices behind it is my understanding
- Where does the vmd driver run (host or guest or both)?
I believe the answer is both.
If the VMD RCiEP isn't passed through to the guest, how can the vmd
driver do anything in the guest?
The VMD device is passed through to the guest. It works just like bare metal
in that the guest OS detects the VMD device and loads the vmd driver which
then enumerates the devices into the guest
I guess it's obvious that the VMD RCiEP must be passed through to the
guest because the whole point of
https://lore.kernel.org/linux-pci/20240408183927.135-1-paul.m.stillwell.jr@xxxxxxxxx/
is to do something in the guest.
It does puzzle me that we have two copies of the vmd driver (one in
the host OS and another in the guest OS) that think they own the same
physical device. I'm not a virtualization guru but that sounds
potentially problematic.
IIUC, the current situation is "regardless of what firmware said, in
the VMD domain we want AER disabled and hotplug enabled."
We aren't saying we want AER disabled, we are just saying we want hotplug
enabled. The observation is that in a hypervisor scenario AER is going to be
disabled because the _OSC bits are all 0.
04b12ef163d1 ("PCI: vmd: Honor ACPI _OSC on PCIe features") is saying
we want AER disabled for the VMD domain, isn't it?
I don't see it that way, but maybe I'm misunderstanding something. Here
is the code from that commit (only the portion of interest):
+/*
+ * Since VMD is an aperture to regular PCIe root ports, only allow it to
+ * control features that the OS is allowed to control on the physical
PCI bus.
+ */
+static void vmd_copy_host_bridge_flags(struct pci_host_bridge *root_bridge,
+ struct pci_host_bridge *vmd_bridge)
+{
+ vmd_bridge->native_pcie_hotplug = root_bridge->native_pcie_hotplug;
+ vmd_bridge->native_shpc_hotplug = root_bridge->native_shpc_hotplug;
+ vmd_bridge->native_aer = root_bridge->native_aer;
+ vmd_bridge->native_pme = root_bridge->native_pme;
+ vmd_bridge->native_ltr = root_bridge->native_ltr;
+ vmd_bridge->native_dpc = root_bridge->native_dpc;
+}
+
When I look at this, we are copying whatever is in the root_bridge to
the vmd_bridge. In a bare metal scenario, this is correct and AER will
be whatever the BIOS set up (IIRC on my bare metal system AER is
enabled). In a hypervisor scenario the root_bridge bits will all be 0
which effectively disables AER and all the similar bits.
Prior to this commit all the native_* bits were set to 1 because
pci_init_host_bridge() sets them all to 1 so we were enabling AER et all
despite what the OS may have wanted. With the commit we are setting the
bits to whatever the BIOS has set them to.
It seems like the only clear option is to say "the vmd driver owns all
PCIe services in the VMD domain, the platform does not supply _OSC for
the VMD domain, the platform can't do anything with PCIe services in
the VMD domain, and the vmd driver needs to explicitly enable/disable
services as it needs."
I actually looked at this as well :) I had an idea to set the _OSC bits to 0
when the vmd driver created the domain. The look at all the root ports
underneath it and see if AER and PM were set. If any root port underneath
VMD set AER or PM then I would set the _OSC bit for the bridge to 1. That
way if any root port underneath VMD had enabled AER (as an example) then
that feature would still work. I didn't test this in a hypervisor scenario
though so not sure what I would see.
_OSC negotiates ownership of features between platform firmware and
OSPM. The "native_pcie_hotplug" and similar bits mean that "IF a
device advertises the feature, the OS can use it." We clear those
native_* bits if the platform retains ownership via _OSC.
If BIOS doesn't enable the VMD host bridge and doesn't supply _OSC for
the domain below it, why would we assume that BIOS retains ownership
of the features negotiated by _OSC? I think we have to assume the OS
owns them, which is what happened before 04b12ef163d1.
Sorry, this confuses me :) If BIOS doesn't enable VMD (i.e. VMD is
disabled) then all the root ports and devices underneath VMD are visible
to the BIOS and OS so ACPI would run on all of them and the _OSC bits
should be set correctly.
There was a previous suggestion to print what VMD is owning. I think
that is a good idea and I can add that to my patch. It would look like
this (following the way OS support gets printed):
VMD supports PCIeHotplug
VMD supports SHPCHotplug
VMD now owns PCIeHotplug
VMD now owns SHPCHotplug
We could probably do the same thing for the other bits (based on the
native_* bits). That might make it clearer...
Paul
Bjorn