On 2023/12/4 18:28, Roger Pau Monné wrote: > On Fri, Dec 01, 2023 at 07:37:55PM -0800, Stefano Stabellini wrote: >> On Fri, 1 Dec 2023, Roger Pau Monné wrote: >>> On Thu, Nov 30, 2023 at 07:15:17PM -0800, Stefano Stabellini wrote: >>>> On Thu, 30 Nov 2023, Roger Pau Monné wrote: >>>>> On Wed, Nov 29, 2023 at 07:53:59PM -0800, Stefano Stabellini wrote: >>>>>> On Fri, 24 Nov 2023, Jiqian Chen wrote: >>>>>>> This patch is to solve two problems we encountered when we try to >>>>>>> passthrough a device to hvm domU base on Xen PVH dom0. >>>>>>> >>>>>>> First, hvm guest will alloc a pirq and irq for a passthrough device >>>>>>> by using gsi, before that, the gsi must first has a mapping in dom0, >>>>>>> see Xen code pci_add_dm_done->xc_domain_irq_permission, it will call >>>>>>> into Xen and check whether dom0 has the mapping. See >>>>>>> XEN_DOMCTL_irq_permission->pirq_access_permitted, "current" is PVH >>>>>>> dom0 and it return irq is 0, and then return -EPERM. >>>>>>> This is because the passthrough device doesn't do PHYSDEVOP_map_pirq >>>>>>> when thay are enabled. >>>>>>> >>>>>>> Second, in PVH dom0, the gsi of a passthrough device doesn't get >>>>>>> registered, but gsi must be configured for it to be able to be >>>>>>> mapped into a domU. >>>>>>> >>>>>>> After searching codes, we can find map_pirq and register_gsi will be >>>>>>> done in function vioapic_write_redirent->vioapic_hwdom_map_gsi when >>>>>>> the gsi(aka ioapic's pin) is unmasked in PVH dom0. So the problems >>>>>>> can be conclude to that the gsi of a passthrough device doesn't be >>>>>>> unmasked. >>>>>>> >>>>>>> To solve the unmaske problem, this patch call the unmask_irq when we >>>>>>> assign a device to be passthrough. So that the gsi can get registered >>>>>>> and mapped in PVH dom0. >>>>>> >>>>>> >>>>>> Roger, this seems to be more of a Xen issue than a Linux issue. Why do >>>>>> we need the unmask check in Xen? Couldn't we just do: >>>>>> >>>>>> >>>>>> diff --git a/xen/arch/x86/hvm/vioapic.c b/xen/arch/x86/hvm/vioapic.c >>>>>> index 4e40d3609a..df262a4a18 100644 >>>>>> --- a/xen/arch/x86/hvm/vioapic.c >>>>>> +++ b/xen/arch/x86/hvm/vioapic.c >>>>>> @@ -287,7 +287,7 @@ static void vioapic_write_redirent( >>>>>> hvm_dpci_eoi(d, gsi); >>>>>> } >>>>>> >>>>>> - if ( is_hardware_domain(d) && unmasked ) >>>>>> + if ( is_hardware_domain(d) ) >>>>>> { >>>>>> /* >>>>>> * NB: don't call vioapic_hwdom_map_gsi while holding hvm.irq_lock >>>>> >>>>> There are some issues with this approach. >>>>> >>>>> mp_register_gsi() will only setup the trigger and polarity of the >>>>> IO-APIC pin once, so we do so once the guest unmask the pin in order >>>>> to assert that the configuration is the intended one. A guest is >>>>> allowed to write all kind of nonsense stuff to the IO-APIC RTE, but >>>>> that doesn't take effect unless the pin is unmasked. >>>>> >>>>> Overall the question would be whether we have any guarantees that >>>>> the hardware domain has properly configured the pin, even if it's not >>>>> using it itself (as it hasn't been unmasked). >>>>> >>>>> IIRC PCI legacy interrupts are level triggered and low polarity, so we >>>>> could configure any pins that are not setup at bind time? >>>> >>>> That could work. >>>> >>>> Another idea is to move only the call to allocate_and_map_gsi_pirq at >>>> bind time? That might be enough to pass a pirq_access_permitted check. >>> >>> Maybe, albeit that would change the behavior of XEN_DOMCTL_bind_pt_irq >>> just for PT_IRQ_TYPE_PCI and only when called from a PVH dom0 (as the >>> parameter would be a GSI instead of a previously mapped IRQ). Such >>> difference just for PT_IRQ_TYPE_PCI is slightly weird - if we go that >>> route I would recommend that we instead introduce a new dmop that has >>> this syntax regardless of the domain type it's called from. >> >> Looking at the code it is certainly a bit confusing. My point was that >> we don't need to wait until polarity and trigger are set appropriately >> to allow Dom0 to pass successfully a pirq_access_permitted() check. Xen >> should be able to figure out that Dom0 is permitted pirq access. > > The logic is certainly not straightforward, and it could benefit from > some comments. > > The irq permissions are a bit special, in that they get setup when the > IRQ is mapped. > > The problem however is not so much with IRQ permissions, that we can > indeed sort out internally in Xen. Such check in dom0 has the side > effect of preventing the IRQ from being assigned to a domU without the > hardware source being properly configured AFAICT. > >> >> So the idea was to move the call to allocate_and_map_gsi_pirq() earlier >> somewhere because allocate_and_map_gsi_pirq doesn't require trigger or >> polarity to be configured to work. But the suggestion of doing it a >> "bind time" (meaning: XEN_DOMCTL_bind_pt_irq) was a bad idea. >> >> But maybe we can find another location, maybe within >> xen/arch/x86/hvm/vioapic.c, to call allocate_and_map_gsi_pirq() before >> trigger and polarity are set and before the interrupt is unmasked. >> >> Then we change the implementation of vioapic_hwdom_map_gsi to skip the >> call to allocate_and_map_gsi_pirq, because by the time >> vioapic_hwdom_map_gsi we assume that allocate_and_map_gsi_pirq had >> already been done. > > But then we would end up in a situation where the > pirq_access_permitted() check will pass, but the IO-APIC pin won't be > configured, which I think it's not what we want. > > One option would be to allow mp_register_gsi() to be called multiple > times, and update the IO-APIC pin configuration as long as the pin is > not unmasked. That would propagate each dom0 RTE update to the > underlying IO-APIC. However such approach relies on dom0 configuring > all possible IO-APIC pins, even if no device on dom0 is using them, I > think it's not a very reliable option. > > Another option would be to modify the toolstack to setup the GSI > itself using the PHYSDEVOP_setup_gsi hypercall. As said in a previous > email, since we only care about PCI device passthrough the legacy INTx > should always be level triggered and low polarity. I am thinking if we can do PHYSDEVOP_map_pirq and PHYSDEVOP_setup_gsi only for passthrough devices(in function pcistub_init_device). Then it can pass permission check and setup gsi without affecting other devices that do not use IO-APIC pins. What do you think? > >> I am not familiar with vioapic.c but to give you an idea of what I was >> thinking: >> >> >> diff --git a/xen/arch/x86/hvm/vioapic.c b/xen/arch/x86/hvm/vioapic.c >> index 4e40d3609a..16d56fe851 100644 >> --- a/xen/arch/x86/hvm/vioapic.c >> +++ b/xen/arch/x86/hvm/vioapic.c >> @@ -189,14 +189,6 @@ static int vioapic_hwdom_map_gsi(unsigned int gsi, unsigned int trig, >> return ret; >> } >> >> - ret = allocate_and_map_gsi_pirq(currd, pirq, &pirq); >> - if ( ret ) >> - { >> - gprintk(XENLOG_WARNING, "vioapic: error mapping GSI %u: %d\n", >> - gsi, ret); >> - return ret; >> - } >> - >> pcidevs_lock(); >> ret = pt_irq_create_bind(currd, &pt_irq_bind); >> if ( ret ) >> @@ -287,6 +279,17 @@ static void vioapic_write_redirent( >> hvm_dpci_eoi(d, gsi); >> } >> >> + if ( is_hardware_domain(d) ) >> + { >> + int pirq = gsi, ret; >> + ret = allocate_and_map_gsi_pirq(currd, pirq, &pirq); >> + if ( ret ) >> + { >> + gprintk(XENLOG_WARNING, "vioapic: error mapping GSI %u: %d\n", >> + gsi, ret); >> + return ret; >> + } >> + } >> if ( is_hardware_domain(d) && unmasked ) >> { >> /* > > As said above, such approach relies on dom0 writing to the IO-APIC RTE > of likely each IO-APIC pin, which is IMO not quite reliable. In there > are two different issues here that need to be fixed for PVH dom0: > > - Fix the XEN_DOMCTL_irq_permission pirq_access_permitted() call to > succeed for a PVH dom0, even if dom0 is not using the GSI itself. > > - Configure IO-APIC pins for PCI interrupts even if dom0 is not using > the IO-APIC pin itself. > > First one needs to be fixed internally in Xen, second one will require > the toolstack to issue an extra hypercall in order to ensure the > IO-APIC pin is properly configured. > > Thanks, Roger. -- Best regards, Jiqian Chen.