Re: [PATCH] Documentation: PCI: add vmd documentation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 4/19/2024 2:14 PM, Bjorn Helgaas wrote:
[+cc Kai-Heng, Rafael, Lorenzo since we're talking about 04b12ef163d1
("PCI: vmd: Honor ACPI _OSC on PCIe features")]

On Thu, Apr 18, 2024 at 02:51:19PM -0700, Paul M Stillwell Jr wrote:
On 4/18/2024 11:26 AM, Bjorn Helgaas wrote:
On Wed, Apr 17, 2024 at 01:15:42PM -0700, Paul M Stillwell Jr wrote:
Adding documentation for the Intel VMD driver and updating the index
file to include it.

Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@xxxxxxxxx>
---
   Documentation/PCI/controller/vmd.rst | 51 ++++++++++++++++++++++++++++
   Documentation/PCI/index.rst          |  1 +
   2 files changed, 52 insertions(+)
   create mode 100644 Documentation/PCI/controller/vmd.rst

diff --git a/Documentation/PCI/controller/vmd.rst b/Documentation/PCI/controller/vmd.rst
new file mode 100644
index 000000000000..e1a019035245
--- /dev/null
+++ b/Documentation/PCI/controller/vmd.rst
@@ -0,0 +1,51 @@
+.. SPDX-License-Identifier: GPL-2.0+
+
+=================================================================
+Linux Base Driver for the Intel(R) Volume Management Device (VMD)
+=================================================================
+
+Intel vmd Linux driver.
+
+Contents
+========
+
+- Overview
+- Features
+- Limitations
+
+The Intel VMD provides the means to provide volume management across separate
+PCI Express HBAs and SSDs without requiring operating system support or
+communication between drivers. It does this by obscuring each storage
+controller from the OS, but allowing a single driver to be loaded that would
+control each storage controller. A Volume Management Device (VMD) provides a
+single device for a single storage driver. The VMD resides in the IIO root

I'm not sure IIO (and PCH below) are really relevant to this.  I think

I'm trying to describe where in the CPU architecture VMD exists because it's
not like other devices. It's not like a storage device or networking device
that is plugged in somewhere; it exists as part of the CPU (in the IIO). I'm
ok removing it, but it might be confusing to someone looking at the
documentation. I'm also close to this so it may be clear to me, but
confusing to others (which I know it is) so any help making it clearer would
be appreciated.

The vmd driver binds to a PCI device, and it doesn't matter where it's
physically implemented.


Ok

we really just care about the PCI topology enumerable by the OS.  If
they are relevant, expand them on first use as you did for VMD so we
have a hint about how to learn more about it.

I don't fully understand this comment. The PCI topology behind VMD is not
enumerable by the OS unless we are considering the vmd driver the OS. If the
user enables VMD in the BIOS and the vmd driver isn't loaded, then the OS
never sees the devices behind VMD.

The only reason the devices are seen by the OS is that the VMD driver does
some mapping when the VMD driver loads during boot.

Yes.  I think it's worth mentioning these details about the hierarchy
behind VMD, e.g., how their config space is reached, how driver MMIO
accesses are routed, how device interrupts are routed, etc.

The 185a383ada2e ("x86/PCI: Add driver for Intel Volume Management
Device (VMD)") commit log mentioned by Keith is a fantastic start
and answers several things below.  Putting it in this doc would be
great because that commit is not very visible.


I'll look at it and work it in (either replace this overview or add to it)

+the VMD is in a central location to manipulate access to storage devices which
+may be attached directly to the IIO or indirectly through the PCH. Instead of
+allowing individual storage devices to be detected by the OS and allow it to
+load a separate driver instance for each, the VMD provides configuration
+settings to allow specific devices and root ports on the root bus to be
+invisible to the OS.

How are these settings configured?  BIOS setup menu?

I believe there are 2 ways this is done:

The first is that the system designer creates a design such that some root
ports and end points are behind VMD. If VMD is enabled in the BIOS then
these devices don't show up to the OS and require a driver to use them (the
vmd driver). If VMD is disabled in the BIOS then the devices are seen by the
OS at boot time.

The second way is that there are settings in the BIOS for VMD. I don't think
there are many settings... it's mostly enable/disable VMD

I think what I want to know here is just "there's a BIOS switch that
enables VMD, resulting in only the VMD RCiEP being visible, or
disables VMD, resulting in the VMD RCiEP not being visible (?) and the
VMD Root Ports and downstream hierarchies being just normal Root
Ports."  You can wordsmith that; I just wanted to know what
"configuration settings" referred to so users would know where they
live and what they mean.


Got it, will update to reflect that setup/config is through the BIOS

+VMD works by creating separate PCI domains for each VMD device in the system.
+This makes VMD look more like a host bridge than an endpoint so VMD must try
+to adhere to the ACPI Operating System Capabilities (_OSC) flags of the system.

I think "creating a separate PCI domain" is a consequence of providing
a new config access mechanism, e.g., a new ECAM region, for devices
below the VMD bridge.  That hardware mechanism is important to
understand because it means those downstream devices are unknown to
anything that doesn't grok the config access mechanism.  For example,
firmware wouldn't know anything about them unless it had a VMD driver.

    - Which devices (VMD bridge, VMD Root Ports, devices below VMD Root
      Ports) are enumerated in the host?

Only the VMD device (as a PCI end point) are seen by the OS without the vmd
driver

This is the case when VMD is enabled by BIOS, right?  And when the vmd

Yes

driver sets up the new config space and enumerates behind VMD, we'll
find the VMD Root Ports and the hierarchies below them?


Yes

If BIOS leaves the VMD functionality disabled, I guess those VMD Root
Ports and the hierarchies below them are enumerated normally, without
the vmd driver being involved?  (Maybe the VMD RCiEP itself is

Yes

disabled, so we won't enumerate it and we won't try to bind the vmd
driver to it?)


Yes

    - Which devices are passed through to a virtual guest and enumerated
      there?

All devices under VMD are passed to a virtual guest

So the guest will see the VMD Root Ports, but not the VMD RCiEP
itself?


The guest will see the VMD device and then the vmd driver in the guest will enumerate the devices behind it is my understanding

    - Where does the vmd driver run (host or guest or both)?

I believe the answer is both.

If the VMD RCiEP isn't passed through to the guest, how can the vmd
driver do anything in the guest?


The VMD device is passed through to the guest. It works just like bare metal in that the guest OS detects the VMD device and loads the vmd driver which then enumerates the devices into the guest

    - Who (host or guest) runs the _OSC for the new VMD domain?

I believe the answer here is neither :) This has been an issue since commit
04b12ef163d1 ("PCI: vmd: Honor ACPI _OSC on PCIe features"). I've submitted
this patch (https://lore.kernel.org/linux-pci/20240408183927.135-1-paul.m.stillwell.jr@xxxxxxxxx/)
to attempt to fix the issue.

IIUC we make no attempt to evaluate _OSC for the new VMD domain (and
firmware likely would not supply one anyway since it doesn't know how
to enumerate anything there).  So 04b12ef163d1 just assumes the _OSC
that applies to the VMD RCiEP also applies to the new VMD domain below
the VMD.

If that's what we want to happen, we should state this explicitly.
But I don't think it *is* what we want; see below.

You are much more of an expert in this area than I am, but as far as I can
tell the only way the _OSC bits get cleared is via ACPI (specifically this
code https://elixir.bootlin.com/linux/latest/source/drivers/acpi/pci_root.c#L1038).
Since ACPI doesn't run on the devices behind VMD the _OSC bits don't get set
properly for them.

Apparently there's *something* in ACPI related to the VMD domain
because 59dc33252ee7 ("PCI: VMD: ACPI: Make ACPI companion lookup work
for VMD bus") seems to add support for ACPI companions for devices in
the VMD domain?


I've seen this code also and don't understand what it's supposed to do. I'll investigate this further

I don't understand what's going on here because if BIOS enables VMD,
firmware would see the VMD RCiEP but not devices behind it.  And if
BIOS disables VMD, those devices should all appear normally and we
wouldn't need 59dc33252ee7.


I agree, I'll try to figure this out and get an answer

Ultimately the only _OSC bits that VMD cares about are the hotplug bits
because that is a feature of our device; it enables hotplug in guests where
there is no way to enable it. That's why my patch is to set them all the
time and copy the other _OSC bits because there is no other way to enable
this feature (i.e. there is no user space tool to enable/disable it).

And here's the problem, because the _OSC that applies to domain X that
contains the VMD RCiEP may result in the platform retaining ownership
of all these PCIe features (hotplug, AER, PME, etc), but you want OSPM
to own them for the VMD domain Y, regardless of whether it owns them
for domain X.


I thought each domain gets it's own _OSC. In the case of VMD, it creates a new domain so the _OSC for that domain should be determined somehow. Normally that determination would be done via ACPI (if I'm understanding things correctly) and in the case of VMD I am saying that hotplug should always be enabled and we can use the _OSC from another domain for the other bits.

I went down a rabbit hole when I started working on this and had an idea that maybe the VMD driver should create an ACPI table, but that seemed way more complicated than it needs to be. Plus I wasn't sure what good it would do because I don't understand the relationship between the root ports below VMD and the settings we would use for the bridge.

OSPM *did* own all PCIe features before 04b12ef163d1 because the new
VMD "host bridge" got native owership by default.

    - What happens to interrupts generated by devices downstream from
      VMD, e.g., AER interrupts from VMD Root Ports, hotplug interrupts
      from VMD Root Ports or switch downstream ports?  Who fields them?
      In general firmware would field them unless it grants ownership
      via _OSC.  If firmware grants ownership (or the OS forcibly takes
      it by overriding it for hotplug), I guess the OS that requested
      ownership would field them?

The interrupts are passed through VMD to the OS. This was the AER issue that
resulted in commit 04b12ef163d1 ("PCI: vmd: Honor ACPI _OSC on PCIe
features"). IIRC AER was disabled in the BIOS, but is was enabled in the VMD
host bridge because pci_init_host_bridge() sets all the bits to 1 and that
generated an AER interrupt storm.

In bare metal scenarios the _OSC bits are correct, but in a hypervisor
scenario the bits are wrong because they are all 0 regardless of what the
ACPI tables indicate. The challenge is that the VMD driver has no way to
know it's in a hypervisor to set the hotplug bits correctly.

This is the problem.  We claim "the bits are wrong because they are
all 0", but I don't think we can derive that conclusion from anything
in the ACPI, PCIe, or PCI Firmware specs, which makes it hard to
maintain.


I guess I look at it this way: if I run a bare metal OS and I get hotplug and AER are enabled and then I run a guest on the same system and I get all _OSC bits are disabled then I think the bits aren't correct. But there isn't any way to detect that through the "normal" channels (ACPI, PCIe, etc).

IIUC, the current situation is "regardless of what firmware said, in
the VMD domain we want AER disabled and hotplug enabled."


We aren't saying we want AER disabled, we are just saying we want hotplug enabled. The observation is that in a hypervisor scenario AER is going to be disabled because the _OSC bits are all 0.

04b12ef163d1 disabled AER by copying the _OSC that applied to the VMD
RCiEP (although this sounds broken because it probably left AER
*enabled* if that _OSC happened to grant ownership to the OS).

And your patch at
https://lore.kernel.org/linux-pci/20240408183927.135-1-paul.m.stillwell.jr@xxxxxxxxx/
enables hotplug by setting native_pcie_hotplug to 1.

It seems like the only clear option is to say "the vmd driver owns all
PCIe services in the VMD domain, the platform does not supply _OSC for
the VMD domain, the platform can't do anything with PCIe services in
the VMD domain, and the vmd driver needs to explicitly enable/disable
services as it needs."


I actually looked at this as well :) I had an idea to set the _OSC bits to 0 when the vmd driver created the domain. The look at all the root ports underneath it and see if AER and PM were set. If any root port underneath VMD set AER or PM then I would set the _OSC bit for the bridge to 1. That way if any root port underneath VMD had enabled AER (as an example) then that feature would still work. I didn't test this in a hypervisor scenario though so not sure what I would see.

Would that be an acceptable idea?

Paul

Bjorn





[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux