On Fri, Feb 09, 2024 at 03:05:49PM -0600, Bjorn Helgaas wrote: > On Thu, Feb 01, 2024 at 09:39:49AM +0100, Roger Pau Monné wrote: > > On Wed, Jan 31, 2024 at 01:00:14PM -0600, Bjorn Helgaas wrote: > > > On Wed, Jan 31, 2024 at 09:58:19AM +0100, Roger Pau Monné wrote: > > > > On Tue, Jan 30, 2024 at 02:44:03PM -0600, Bjorn Helgaas wrote: > > > > > On Tue, Jan 30, 2024 at 10:07:36AM +0100, Roger Pau Monné wrote: > > > > > > On Mon, Jan 29, 2024 at 04:01:13PM -0600, Bjorn Helgaas wrote: > > > > > > > On Thu, Jan 25, 2024 at 07:17:24AM +0000, Chen, Jiqian wrote: > > > > > > > > On 2024/1/24 00:02, Bjorn Helgaas wrote: > > > > > > > > > On Tue, Jan 23, 2024 at 10:13:52AM +0000, Chen, Jiqian wrote: > > > > > > > > >> On 2024/1/23 07:37, Bjorn Helgaas wrote: > > > > > > > > >>> On Fri, Jan 05, 2024 at 02:22:17PM +0800, Jiqian Chen wrote: > > > > > > > > >>>> There is a need for some scenarios to use gsi sysfs. > > > > > > > > >>>> For example, when xen passthrough a device to dumU, it will > > > > > > > > >>>> use gsi to map pirq, but currently userspace can't get gsi > > > > > > > > >>>> number. > > > > > > > > >>>> So, add gsi sysfs for that and for other potential scenarios. > > > > > > > > >> ... > > > > > > > > > > > > > > > > > >>> I don't know enough about Xen to know why it needs the GSI in > > > > > > > > >>> userspace. Is this passthrough brand new functionality that can't be > > > > > > > > >>> done today because we don't expose the GSI yet? > > > > > > > > > > > > > > I assume this must be new functionality, i.e., this kind of > > > > > > > passthrough does not work today, right? > > > > > > > > > > > > > > > >> has ACPI support and is responsible for detecting and controlling > > > > > > > > >> the hardware, also it performs privileged operations such as the > > > > > > > > >> creation of normal (unprivileged) domains DomUs. When we give to a > > > > > > > > >> DomU direct access to a device, we need also to route the physical > > > > > > > > >> interrupts to the DomU. In order to do so Xen needs to setup and map > > > > > > > > >> the interrupts appropriately. > > > > > > > > > > > > > > > > > > What kernel interfaces are used for this setup and mapping? > > > > > > > > > > > > > > > > For passthrough devices, the setup and mapping of routing physical > > > > > > > > interrupts to DomU are done on Xen hypervisor side, hypervisor only > > > > > > > > need userspace to provide the GSI info, see Xen code: > > > > > > > > xc_physdev_map_pirq require GSI and then will call hypercall to pass > > > > > > > > GSI into hypervisor and then hypervisor will do the mapping and > > > > > > > > routing, kernel doesn't do the setup and mapping. > > > > > > > > > > > > > > So we have to expose the GSI to userspace not because userspace itself > > > > > > > uses it, but so userspace can turn around and pass it back into the > > > > > > > kernel? > > > > > > > > > > > > No, the point is to pass it back to Xen, which doesn't know the > > > > > > mapping between GSIs and PCI devices because it can't execute the ACPI > > > > > > AML resource methods that provide such information. > > > > > > > > > > > > The (Linux) kernel is just a proxy that forwards the hypercalls from > > > > > > user-space tools into Xen. > > > > > > > > > > But I guess Xen knows how to interpret a GSI even though it doesn't > > > > > have access to AML? > > > > > > > > On x86 Xen does know how to map a GSI into an IO-APIC pin, in order > > > > configure the RTE as requested. > > > > > > IIUC, mapping a GSI to an IO-APIC pin requires information from the > > > MADT. So I guess Xen does use the static ACPI tables, but not the AML > > > _PRT methods that would connect a GSI with a PCI device? > > > > Yes, Xen can parse the static tables, and knows the base GSI of > > IO-APICs from the MADT. > > > > > I guess this means Xen would not be able to deal with _MAT methods, > > > which also contains MADT entries? I don't know the implications of > > > this -- maybe it means Xen might not be able to use with hot-added > > > devices? > > > > It's my understanding _MAT will only be present on some very specific > > devices (IO-APIC or CPU objects). Xen doesn't support hotplug of > > IO-APICs, but hotplug of CPUs should in principle be supported with > > cooperation from the control domain OS (albeit it's not something that > > we tests on our CI). I don't expect however that a CPU object _MAT > > method will return IO APIC entries. > > > > > The tables (including DSDT and SSDTS that contain the AML) are exposed > > > to userspace via /sys/firmware/acpi/tables/, but of course that > > > doesn't mean Xen knows how to interpret the AML, and even if it did, > > > Xen probably wouldn't be able to *evaluate* it since that could > > > conflict with the host kernel's use of AML. > > > > Indeed, there can only be a single OSPM, and that's the dom0 OS (Linux > > in our context). > > > > Getting back to our context though, what would be a suitable place for > > exposing the GSI assigned to each device? > > IIUC, the Xen hypervisor: > > - Interprets /sys/firmware/acpi/tables/APIC (or gets this via > something running on the Dom0 kernel) to find the physical base > address and GSI base, e.g., from I/O APIC, I/O SAPIC. No, Xen parses the MADT directly from memory, before stating dom0. That's a static table so it's fine for Xen to parse it and obtain the I/O APIC GSI base. > - Needs the GSI to locate the APIC and pin within the APIC. The > Dom0 kernel is the OSPM, so only it can evaluate the AML _PRT to > learn the PCI device -> GSI mapping. Yes, Xen doesn't know the PCI device -> GSI mapping. Dom0 needs to parse the ACPI methods and signal Xen to configure a GSI with a given trigger and polarity. > - Has direct access to the APIC physical base address to program the > Redirection Table. Yes, the hardware (native) I/O APIC is owned by Xen, and not directly accessible by dom0. > The patch seems a little messy to me because the PCI core has to keep > track of the GSI even though it doesn't need it itself. And the > current patch exposes it on all arches, even non-ACPI ones or when > ACPI is disabled (easily fixable). > > We only call acpi_pci_irq_enable() in the pci_enable_device() path, so > we don't know the GSI unless a Dom0 driver has claimed the device and > called pci_enable_device() for it, which seems like it might not be > desirable. I think that's always the case, as on dom0 devices to be passed through are handled by pciback which does enable them. I agree it might be best to not tie exposing the node to pci_enable_device() having been called. Is _PRT only evaluated as part of acpi_pci_irq_enable()? (or pci_enable_device()). > I was hoping we could put it in /sys/firmware/acpi/interrupts, but > that looks like it's only for SCI statistics. I guess we could moot a > new /sys/firmware/acpi/gsi/ directory, but then each file there would > have to identify a device, which might not be as convenient as the > /sys/devices/ directory that already exists. I guess there may be > GSIs for things other than PCI devices; will you ever care about any > of those? We only support passthrough of PCI devices so far, but I guess if any of such non-PCI devices ever appear and those use a GSI, and Xen supports passthrough for them, then yes, we would need to fetch such GSI somehow. Thanks, Roger.