Hi Alex Could you please refer to my previous message? Thank you in advance, Grzegorz czw., 9 mar 2023 o 14:41 Grzegorz Jaszczyk <jaz@xxxxxxxxxxxx> napisał(a): > > czw., 9 mar 2023 o 00:38 Alex Williamson <alex.williamson@xxxxxxxxxx> > napisał(a): > > > > On Wed, 8 Mar 2023 14:44:28 -0800 > > Dominik Behr <dbehr@xxxxxxxxxx> wrote: > > > > > On Wed, Mar 8, 2023 at 12:06 PM Alex Williamson > > > <alex.williamson@xxxxxxxxxx> wrote: > > > > > > > > On Wed, 8 Mar 2023 10:45:51 -0800 > > > > Dominik Behr <dbehr@xxxxxxxxxxxx> wrote: > > > > > > > > > It is the same interface as other ACPI events like AC adapter LID etc > > > > > are forwarded to user-space. > > > > > ACPI events are not particularly high frequency like interrupts. > > > > > > > > I'm not sure that's relevant, these interfaces don't proclaim to > > > > provide isolation among host processes which manage behavior relative > > > > to accessories. These are effectively system level services. It's only > > > > a very, very specialized use case that places a VMM as peers among these > > > > processes. Generally we don't want to grant a VMM any privileges beyond > > > > what it absolutely needs, so letting a VMM managing an assigned NIC > > > > really ought not to be able to snoop host events related to anything > > > > other than the NIC. > > > How is that related to the fact that we are forwarding VFIO-PCI events > > > to netlink? Kernel does not grant any privileges to VMM. > > > There are already other ACPI events on netlink. The implementer of the > > > VMM can choose to allow VMM to snoop them or not. > > > In our case our VMM (crosvm) does already snoop LID, battery and AC > > > adapter events so the guest can adjust its behavior accordingly. > > > This change just adds another class of ACPI events that are forwarded > > > to netlink. > > > > That's true, it is the VMM choice whether to allow snooping netlink, > > but this is being proposed as THE solution to allow VMMs to receive > > ACPI events related to vfio assigned devices. If the solution > > inherently requires escalating the VMM privileges to see all netlink > > events, that's a weakness in the proposal. As noted previously, > > there's also no introspection here, the VMM can't know whether it > > should listen to netlink for ACPI events or include AML related to a > > GPE for the device. It cannot determine if either the kernel supports > > this feature or if the device has an ACPI companion that can generate > > these events. > > To be precise the VMM doesn't listen to all netlink events: it listens > only to "acpi_event" family and acpi related multicast group, which > means it listens to all events generated through > acpi_bus_generate_netlink_event. > > Before sending this patch I thought about using eventfd instead > netalink which will actually provide a channel associated with a given > device and therefore such notifications will be received only by the > VMM associated with such a device. Nevertheless, it seems like eventfd > will allow to signalize events happening (notify on a given device) > but is not capable of sending any payload so in our case there is no > room for propagating notification value via eventfd. Maybe there is > other mechanism eventfd-like which will allow to achieve above? > > If there is no such mechanism, maybe instead of using existing acpi > netlink events, which are associated with "acpi_event" netlink family > and acpi multicast group, we could create per vfio-pci a different > netlink family or probably reuse "acpi_event" family but use different > multicast group, so each device will have dedicated netlink family. > Does it seem reasonable? > > > > > > > > > > > > > > > What sort of ACPI events are we expecting to see here and what does user space do with them? > > > > > The use we are looking at right now are D-notifier events about the > > > > > GPU power available to mobile discrete GPUs. > > > > > The firmware notifies the GPU driver and resource daemon to > > > > > dynamically adjust the amount of power that can be used by the GPU. > > > > > > > > > > > The proposed interface really has no introspection, how does the VMM > > > > > > know which devices need ACPI tables added "upfront"? How do these > > > > > > events factor into hotplug device support, where we may not be able to > > > > > > dynamically inject ACPI code into the VM? > > > > > > > > > > The VMM can examine PCI IDs and the associated firmware node of the > > > > > PCI device to figure out what events to expect and what ACPI table to > > > > > generate to support it but that should not be necessary. > > > > > > > > I'm not entirely sure where your VMM is drawing the line between the VM > > > > and management tools, but I think this is another case where the > > > > hypervisor itself should not have privileges to examine the host > > > > firmware tables to build its own. Something like libvirt would be > > > > responsible for that. > > > Yes, but that depends on the design of hypervisor and VMM and is not > > > related to this patch. > > > > It is very much related to this patch if it proposes an interface to > > solve a problem which is likely not compatible with the security model > > of other VMMs. We need a single solution to support all VMMs. > > > > > > > > > > > A generic GPE based ACPI event forwarder as Grzegorz proposed can be > > > > > injected at VM init time and handle any notification that comes later, > > > > > even from hotplug devices. > > > > > > > > It appears that forwarder is sending the notify to a specific ACPI > > > > device node, so it's unclear to me how that becomes boilerplate AML > > > > added to all VMs. We'll need to notify different devices based on > > > > different events, right? > > > Valid point. The notifications have a "scope" ACPI path. > > > In my experience these events are consumed without looking where they > > > came from but I believe the patch can be extended to > > > provide ACPI path, in your example "_SB.PCI0.GPP0.PEGP" instead of > > > generic vfio_pci which VMM could use to translate an equivalent ACPI > > > path in the guest and pass it to a generic ACPI GPE based notifier via > > > shared memory. Grzegorz could you chime in whether that would be > > > possible? > > > > So effectively we're imposing the host ACPI namespace on the VM, or at > > least a mapping between the host and VM namespace? The generality of > > this is not improving. > > Yes, in the example VMM implementation we have mapping between the > host pci device address and guest pci device. Therefore VMM knows, > based on device name (BDF) sent via netlink, to which guest device > this notification should be propagated. The boilerplate AML is added > to each vfio-pci device which belongs to VMM and each vfio-pci device > has associated pre-allocated GPE so the VMM knows which GPE should be > triggered to replicate notification for a given device. BTW this is > only current WIP VMM implementation - this could probably be optimized > if needed. > > Handling hotplug devices is more problematic. I see that the kernel > provides some runtime ACPI patching mechanism: > https://www.kernel.org/doc/html/latest/firmware-guide/acpi/method-customizing.html > (which I never tried) but not even sure how VMM could take advantage > of it. BTW this realized me that the same problem with hotplug applies > to other vfio-pci use-cases e.g. runtime PM, which relies on guest > virtual ACPI method: > https://patchwork.kernel.org/project/linux-pm/patch/20220829114850.4341-5-abhsahu@xxxxxxxxxx/. > Generating virtual ACPI content for hotplug devices seems like a more > generic issue. > > > > > > > > > The acpi_bus_generate_netlink_event() below really only seems to form a > > > > > > u8 event type from the u32 event. Is this something that could be > > > > > > provided directly from the vfio device uAPI with an ioeventfd, thus > > > > > > providing introspection that a device supports ACPI event notifications > > > > > > and the ability for the VMM to exclusively monitor those events, and > > > > > > only those events for the device, without additional privileges? > > > > > > > > > > From what I can see these events are 8 bit as they come from ACPI. > > > > > They also do not carry any payload and it is up to the receiving > > > > > driver to query any additional context/state from the device. > > > > > This will work the same in the VM where driver can query the same > > > > > information from the passed through PCI device. > > > > > There are multiple other netflink based ACPI events forwarders which > > > > > do exactly the same thing for other devices like AC adapter, lid/power > > > > > button, ACPI thermal notifications, etc. > > > > > They all use the same mechanism and can be received by user-space > > > > > programs whether VMMs or others. > > > > > > > > But again, those other receivers are potentially system services, not > > > > an isolated VM instance operating in a limited privilege environment. > > > > IMO, it's very different if the host display server has access to lid > > > > or power events than it is to allow some arbitrary VM that happens to > > > > have an unrelated assigned device that same privilege. > > > Therefore these VFIO related ACPI events could be received by a system > > > service via this netlink event and selectively forwarded to VMM if > > > such is a desire of whoever implements the userspace. > > > This is outside the scope of this patch. In our case our VMM does > > > receive these LID, AC or battery events. > > > > But this is backwards, we're presupposing the choice to use netlink > > based on the convenience of one VMM, which potentially creates > > obstacles, maybe even security isolation issues for other VMMs. The > > method of delivering ACPI events to a VMM is very much within the scope > > of this proposal. Thanks, > > > > Alex > > > > regarding: > > > > On my laptop, I see multiple _GPE scopes, each apparently very unique > > > > to the devices: > > > > > > > > Scope (_GPE) > > > > { > > > > Method (_L0C, 0, Serialized) // _Lxx: Level-Triggered GPE, xx=0x00-0xFF > > > > { > > > > Notify (\_SB.PCI0.GPP0.PEGP, 0x81) // Information Change > > > > } > > > > > > > > Method (_L0D, 0, Serialized) // _Lxx: Level-Triggered GPE, xx=0x00-0xFF > > > > { > > > > Notify (\_SB.PCI0.GPP0.PEGP, 0x81) // Information Change > > > > } > > > > > > > > Method (_L0F, 0, Serialized) // _Lxx: Level-Triggered GPE, xx=0x00-0xFF > > > > { > > > > Notify (\_SB.PCI0.GPP0.PEGP, 0x81) // Information Change > > > > } > > > > } > > > > > > > > Scope (_GPE) > > > > { > > > > Method (_L19, 0, NotSerialized) // _Lxx: Level-Triggered GPE, xx=0x00-0xFF > > > > { > > > > Notify (\_SB.PCI0.GP17, 0x02) // Device Wake > > > > Notify (\_SB.PCI0.GP17.XHC0, 0x02) // Device Wake > > > > Notify (\_SB.PCI0.GP17.XHC1, 0x02) // Device Wake > > > > Notify (\_SB.PWRB, 0x02) // Device Wake > > > > } > > > > > > > > Method (_L08, 0, NotSerialized) // _Lxx: Level-Triggered GPE, xx=0x00-0xFF > > > > { > > > > Notify (\_SB.PCI0.GP18, 0x02) // Device Wake > > > > Notify (\_SB.PCI0.GPP0, 0x02) // Device Wake > > > > Notify (\_SB.PCI0.GPP1, 0x02) // Device Wake > > > > Notify (\_SB.PCI0.GPP5, 0x02) // Device Wake > > > > } > > > > } > > > > > > > > At least one more even significantly more extensive, calling methods > > > > that interact with OpRegions. So how does a simple stub of a > > > > GPE block replicate this sort of behavior in the host AML? Thanks, > > The simple stub of GPE block will work to replicate the ACPI > notification only: as mentioned earlier GPE handler will be generated > per-vfio device so in your example if let assume that only: > - \_SB.PCI0.GPP0.PEGP > - \_SB.PCI0.GP17.XHC1 > will be pass-through to the guest, the generated AML code for VM will > look more-less like below: > > Scope (_GPE) > { > Method (_E00, 0, NotSerialized) > { > Local0 = \_SB.PCI0.GPP0.PEGP.NOTY > Notify ( \_SB.PCI0.GPP0.PEGP, Local0) > } > } > Scope (_GPE) > { > Method (_E01, 0, NotSerialized) > { > Local0 =\_SB.PCI0.GP17.XHC1.NOTY > Notify (\_SB.PCI0.GP17.XHC1, Local0) > } > } > > So each pass-through device will have associated GPE (0 for > \_SB.PCI0.GPP0.PEGP and 1 for \_SB.PCI0.GP17.XHC1). The path in Notify > could actually be different and related to guest pci hierarchy (but > associated to those host devices). Please also note that in this case > we use GPE in order to "inform" guest about notification coming and we > do not try to replicate host GPE scope description. > > Above we assumed that other devices (like \_SB.PCI0.GPP0/1) are not > pass-through to the guest and notification are handled in host as > usual (they are not binded to pci-vfio) and there is no need to > generate AML code, allocate GPE for them and so on. > > Thank you, > Grzegorz