Re: [RFC KERNEL PATCH v2 2/3] xen/pvh: Unmask irq for passthrough device in PVH dom0

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2023/12/4 18:28, Roger Pau Monné wrote:
> On Fri, Dec 01, 2023 at 07:37:55PM -0800, Stefano Stabellini wrote:
>> On Fri, 1 Dec 2023, Roger Pau Monné wrote:
>>> On Thu, Nov 30, 2023 at 07:15:17PM -0800, Stefano Stabellini wrote:
>>>> On Thu, 30 Nov 2023, Roger Pau Monné wrote:
>>>>> On Wed, Nov 29, 2023 at 07:53:59PM -0800, Stefano Stabellini wrote:
>>>>>> On Fri, 24 Nov 2023, Jiqian Chen wrote:
>>>>>>> This patch is to solve two problems we encountered when we try to
>>>>>>> passthrough a device to hvm domU base on Xen PVH dom0.
>>>>>>>
>>>>>>> First, hvm guest will alloc a pirq and irq for a passthrough device
>>>>>>> by using gsi, before that, the gsi must first has a mapping in dom0,
>>>>>>> see Xen code pci_add_dm_done->xc_domain_irq_permission, it will call
>>>>>>> into Xen and check whether dom0 has the mapping. See
>>>>>>> XEN_DOMCTL_irq_permission->pirq_access_permitted, "current" is PVH
>>>>>>> dom0 and it return irq is 0, and then return -EPERM.
>>>>>>> This is because the passthrough device doesn't do PHYSDEVOP_map_pirq
>>>>>>> when thay are enabled.
>>>>>>>
>>>>>>> Second, in PVH dom0, the gsi of a passthrough device doesn't get
>>>>>>> registered, but gsi must be configured for it to be able to be
>>>>>>> mapped into a domU.
>>>>>>>
>>>>>>> After searching codes, we can find map_pirq and register_gsi will be
>>>>>>> done in function vioapic_write_redirent->vioapic_hwdom_map_gsi when
>>>>>>> the gsi(aka ioapic's pin) is unmasked in PVH dom0. So the problems
>>>>>>> can be conclude to that the gsi of a passthrough device doesn't be
>>>>>>> unmasked.
>>>>>>>
>>>>>>> To solve the unmaske problem, this patch call the unmask_irq when we
>>>>>>> assign a device to be passthrough. So that the gsi can get registered
>>>>>>> and mapped in PVH dom0.
>>>>>>
>>>>>>
>>>>>> Roger, this seems to be more of a Xen issue than a Linux issue. Why do
>>>>>> we need the unmask check in Xen? Couldn't we just do:
>>>>>>
>>>>>>
>>>>>> diff --git a/xen/arch/x86/hvm/vioapic.c b/xen/arch/x86/hvm/vioapic.c
>>>>>> index 4e40d3609a..df262a4a18 100644
>>>>>> --- a/xen/arch/x86/hvm/vioapic.c
>>>>>> +++ b/xen/arch/x86/hvm/vioapic.c
>>>>>> @@ -287,7 +287,7 @@ static void vioapic_write_redirent(
>>>>>>              hvm_dpci_eoi(d, gsi);
>>>>>>      }
>>>>>>  
>>>>>> -    if ( is_hardware_domain(d) && unmasked )
>>>>>> +    if ( is_hardware_domain(d) )
>>>>>>      {
>>>>>>          /*
>>>>>>           * NB: don't call vioapic_hwdom_map_gsi while holding hvm.irq_lock
>>>>>
>>>>> There are some issues with this approach.
>>>>>
>>>>> mp_register_gsi() will only setup the trigger and polarity of the
>>>>> IO-APIC pin once, so we do so once the guest unmask the pin in order
>>>>> to assert that the configuration is the intended one.  A guest is
>>>>> allowed to write all kind of nonsense stuff to the IO-APIC RTE, but
>>>>> that doesn't take effect unless the pin is unmasked.
>>>>>
>>>>> Overall the question would be whether we have any guarantees that
>>>>> the hardware domain has properly configured the pin, even if it's not
>>>>> using it itself (as it hasn't been unmasked).
>>>>>
>>>>> IIRC PCI legacy interrupts are level triggered and low polarity, so we
>>>>> could configure any pins that are not setup at bind time?
>>>>
>>>> That could work.
>>>>
>>>> Another idea is to move only the call to allocate_and_map_gsi_pirq at
>>>> bind time? That might be enough to pass a pirq_access_permitted check.
>>>
>>> Maybe, albeit that would change the behavior of XEN_DOMCTL_bind_pt_irq
>>> just for PT_IRQ_TYPE_PCI and only when called from a PVH dom0 (as the
>>> parameter would be a GSI instead of a previously mapped IRQ).  Such
>>> difference just for PT_IRQ_TYPE_PCI is slightly weird - if we go that
>>> route I would recommend that we instead introduce a new dmop that has
>>> this syntax regardless of the domain type it's called from.
>>
>> Looking at the code it is certainly a bit confusing. My point was that
>> we don't need to wait until polarity and trigger are set appropriately
>> to allow Dom0 to pass successfully a pirq_access_permitted() check. Xen
>> should be able to figure out that Dom0 is permitted pirq access.
> 
> The logic is certainly not straightforward, and it could benefit from
> some comments.
> 
> The irq permissions are a bit special, in that they get setup when the
> IRQ is mapped.
> 
> The problem however is not so much with IRQ permissions, that we can
> indeed sort out internally in Xen.  Such check in dom0 has the side
> effect of preventing the IRQ from being assigned to a domU without the
> hardware source being properly configured AFAICT.
> 
>>
>> So the idea was to move the call to allocate_and_map_gsi_pirq() earlier
>> somewhere because allocate_and_map_gsi_pirq doesn't require trigger or
>> polarity to be configured to work. But the suggestion of doing it a
>> "bind time" (meaning: XEN_DOMCTL_bind_pt_irq) was a bad idea.
>>
>> But maybe we can find another location, maybe within
>> xen/arch/x86/hvm/vioapic.c, to call allocate_and_map_gsi_pirq() before
>> trigger and polarity are set and before the interrupt is unmasked.
>>
>> Then we change the implementation of vioapic_hwdom_map_gsi to skip the
>> call to allocate_and_map_gsi_pirq, because by the time
>> vioapic_hwdom_map_gsi we assume that allocate_and_map_gsi_pirq had
>> already been done.
> 
> But then we would end up in a situation where the
> pirq_access_permitted() check will pass, but the IO-APIC pin won't be
> configured, which I think it's not what we want.
> 
> One option would be to allow mp_register_gsi() to be called multiple
> times, and update the IO-APIC pin configuration as long as the pin is
> not unmasked.  That would propagate each dom0 RTE update to the
> underlying IO-APIC.  However such approach relies on dom0 configuring
> all possible IO-APIC pins, even if no device on dom0 is using them, I
> think it's not a very reliable option.
> 
> Another option would be to modify the toolstack to setup the GSI
> itself using the PHYSDEVOP_setup_gsi hypercall.  As said in a previous
> email, since we only care about PCI device passthrough the legacy INTx
> should always be level triggered and low polarity.

I am thinking if we can do PHYSDEVOP_map_pirq and PHYSDEVOP_setup_gsi only for passthrough devices(in function pcistub_init_device).
Then it can pass permission check and setup gsi without affecting other devices that do not use IO-APIC pins.
What do you think?

> 
>> I am not familiar with vioapic.c but to give you an idea of what I was
>> thinking:
>>
>>
>> diff --git a/xen/arch/x86/hvm/vioapic.c b/xen/arch/x86/hvm/vioapic.c
>> index 4e40d3609a..16d56fe851 100644
>> --- a/xen/arch/x86/hvm/vioapic.c
>> +++ b/xen/arch/x86/hvm/vioapic.c
>> @@ -189,14 +189,6 @@ static int vioapic_hwdom_map_gsi(unsigned int gsi, unsigned int trig,
>>          return ret;
>>      }
>>  
>> -    ret = allocate_and_map_gsi_pirq(currd, pirq, &pirq);
>> -    if ( ret )
>> -    {
>> -        gprintk(XENLOG_WARNING, "vioapic: error mapping GSI %u: %d\n",
>> -                 gsi, ret);
>> -        return ret;
>> -    }
>> -
>>      pcidevs_lock();
>>      ret = pt_irq_create_bind(currd, &pt_irq_bind);
>>      if ( ret )
>> @@ -287,6 +279,17 @@ static void vioapic_write_redirent(
>>              hvm_dpci_eoi(d, gsi);
>>      }
>>  
>> +    if ( is_hardware_domain(d) ) 
>> +    {
>> +        int pirq = gsi, ret;
>> +        ret = allocate_and_map_gsi_pirq(currd, pirq, &pirq);
>> +        if ( ret )
>> +        {
>> +            gprintk(XENLOG_WARNING, "vioapic: error mapping GSI %u: %d\n",
>> +                    gsi, ret);
>> +            return ret;
>> +        }
>> +    }
>>      if ( is_hardware_domain(d) && unmasked )
>>      {
>>          /*
> 
> As said above, such approach relies on dom0 writing to the IO-APIC RTE
> of likely each IO-APIC pin, which is IMO not quite reliable.  In there
> are two different issues here that need to be fixed for PVH dom0:
> 
>  - Fix the XEN_DOMCTL_irq_permission pirq_access_permitted() call to
>    succeed for a PVH dom0, even if dom0 is not using the GSI itself.
> 
>  - Configure IO-APIC pins for PCI interrupts even if dom0 is not using
>    the IO-APIC pin itself.
> 
> First one needs to be fixed internally in Xen, second one will require
> the toolstack to issue an extra hypercall in order to ensure the
> IO-APIC pin is properly configured.
> 
> Thanks, Roger.

-- 
Best regards,
Jiqian Chen.




[Index of Archives]     [Linux IBM ACPI]     [Linux Power Management]     [Linux Kernel]     [Linux Laptop]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]
  Powered by Linux