[+cc Keith] On Wed, Apr 17, 2024 at 01:15:42PM -0700, Paul M Stillwell Jr wrote: > Adding documentation for the Intel VMD driver and updating the index > file to include it. > > Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@xxxxxxxxx> > --- > Documentation/PCI/controller/vmd.rst | 51 ++++++++++++++++++++++++++++ > Documentation/PCI/index.rst | 1 + > 2 files changed, 52 insertions(+) > create mode 100644 Documentation/PCI/controller/vmd.rst > > diff --git a/Documentation/PCI/controller/vmd.rst b/Documentation/PCI/controller/vmd.rst > new file mode 100644 > index 000000000000..e1a019035245 > --- /dev/null > +++ b/Documentation/PCI/controller/vmd.rst > @@ -0,0 +1,51 @@ > +.. SPDX-License-Identifier: GPL-2.0+ > + > +================================================================= > +Linux Base Driver for the Intel(R) Volume Management Device (VMD) > +================================================================= > + > +Intel vmd Linux driver. > + > +Contents > +======== > + > +- Overview > +- Features > +- Limitations > + > +The Intel VMD provides the means to provide volume management across separate > +PCI Express HBAs and SSDs without requiring operating system support or > +communication between drivers. It does this by obscuring each storage > +controller from the OS, but allowing a single driver to be loaded that would > +control each storage controller. A Volume Management Device (VMD) provides a > +single device for a single storage driver. The VMD resides in the IIO root I'm not sure IIO (and PCH below) are really relevant to this. I think we really just care about the PCI topology enumerable by the OS. If they are relevant, expand them on first use as you did for VMD so we have a hint about how to learn more about it. > +complex and it appears to the OS as a root bus integrated endpoint. In the IIO, I suspect "root bus integrated endpoint" means the same as "Root Complex Integrated Endpoint" as defined by the PCIe spec? If so, please use that term and capitalize it so there's no confusion. > +the VMD is in a central location to manipulate access to storage devices which > +may be attached directly to the IIO or indirectly through the PCH. Instead of > +allowing individual storage devices to be detected by the OS and allow it to > +load a separate driver instance for each, the VMD provides configuration > +settings to allow specific devices and root ports on the root bus to be > +invisible to the OS. How are these settings configured? BIOS setup menu? > +VMD works by creating separate PCI domains for each VMD device in the system. > +This makes VMD look more like a host bridge than an endpoint so VMD must try > +to adhere to the ACPI Operating System Capabilities (_OSC) flags of the system. As Keith pointed out, I think this needs more details about how the hardware itself works. I don't think there's enough information here to maintain the OS/platform interface on an ongoing basis. I think "creating a separate PCI domain" is a consequence of providing a new config access mechanism, e.g., a new ECAM region, for devices below the VMD bridge. That hardware mechanism is important to understand because it means those downstream devices are unknown to anything that doesn't grok the config access mechanism. For example, firmware wouldn't know anything about them unless it had a VMD driver. Some of the pieces that might help figure this out: - Which devices (VMD bridge, VMD Root Ports, devices below VMD Root Ports) are enumerated in the host? - Which devices are passed through to a virtual guest and enumerated there? - Where does the vmd driver run (host or guest or both)? - Who (host or guest) runs the _OSC for the new VMD domain? - What happens to interrupts generated by devices downstream from VMD, e.g., AER interrupts from VMD Root Ports, hotplug interrupts from VMD Root Ports or switch downstream ports? Who fields them? In general firmware would field them unless it grants ownership via _OSC. If firmware grants ownership (or the OS forcibly takes it by overriding it for hotplug), I guess the OS that requested ownership would field them? - How do interrupts (hotplug, AER, etc) for things below VMD work? Assuming the OS owns the feature, how does the OS discover them? I guess probably the usual PCIe Capability and MSI/MSI-X Capabilities? Which OS (host or guest) fields them? > +A couple of the _OSC flags regard hotplug support. Hotplug is a feature that > +is always enabled when using VMD regardless of the _OSC flags. We log the _OSC negotiation in dmesg, so if we ignore or override _OSC for hotplug, maybe that should be made explicit in the logging somehow? > +Features > +======== > + > +- Virtualization > +- MSIX interrupts > +- Power Management > +- Hotplug s/MSIX/MSI-X/ to match spec usage. I'm not sure what this list is telling us. > +Limitations > +=========== > + > +When VMD is enabled and used in a hypervisor the _OSC flags provided by the > +hypervisor BIOS may not be correct. The most critical of these flags are the > +hotplug bits. If these bits are incorrect then the storage devices behind the > +VMD will not be able to be hotplugged. The driver always supports hotplug for > +the devices behind it so the hotplug bits reported by the OS are not used. "_OSC may not be correct" sounds kind of problematic. How does the OS deal with this? How does the OS know whether to pay attention to _OSC or ignore it because it tells us garbage? If we ignore _OSC hotplug bits because "we know what we want, and we know we won't conflict with firmware," how do we deal with other _OSC bits? AER? PME? What about bits that may be added in the future? Is there some kind of roadmap to help answer these questions? Bjorn