Re: [PATCH V3 09/10] vfio/pci: Support dynamic MSI-X

Reinette Chatre <reinette.chatre@xxxxxxxxx> · Wed, 19 Apr 2023 15:03:15 -0700

Hi Alex,

On 4/19/2023 2:38 PM, Alex Williamson wrote:
> On Wed, 19 Apr 2023 11:13:29 -0700
> Reinette Chatre <reinette.chatre@xxxxxxxxx> wrote:
>> On 4/18/2023 3:38 PM, Alex Williamson wrote:
>>> On Tue, 18 Apr 2023 10:29:20 -0700
>>> Reinette Chatre <reinette.chatre@xxxxxxxxx> wrote:
>>>   
>>>> Recently introduced pci_msix_alloc_irq_at() and pci_msix_free_irq()
>>>> enables an individual MSI-X interrupt to be allocated and freed after
>>>> MSI-X enabling.
>>>>
>>>> Use dynamic MSI-X (if supported by the device) to allocate an interrupt
>>>> after MSI-X is enabled. An MSI-X interrupt is dynamically allocated at
>>>> the time a valid eventfd is assigned. This is different behavior from
>>>> a range provided during MSI-X enabling where interrupts are allocated
>>>> for the entire range whether a valid eventfd is provided for each
>>>> interrupt or not.
>>>>
>>>> Do not dynamically free interrupts, leave that to when MSI-X is
>>>> disabled.  
>>>
>>> But we do, sometimes, even if essentially only on the error path.  Is
>>> that worthwhile?  It seems like we could entirely remove
>>> vfio_msi_free_irq() and rely only on pci_free_irq_vectors() on MSI/X
>>> teardown.  
>>
>> Yes, it is only on the error path where dynamic MSI-X interrupts are
>> removed. I do not know how to determine if it is worthwhile. On the
>> kernel side failure seems unlikely since it would mean memory cannot
>> be allocated or request_irq() failed. In these cases it may not be
>> worthwhile since user space may try again and having the interrupt
>> already allocated would be helpful. The remaining error seems to be
>> if user space provided an invalid eventfd. An allocation in response
>> to wrong user input is a concern to me. Should we consider
>> buggy/malicious users? I am uncertain here so would defer to your
>> guidance.
> 
> I don't really see that a malicious user can exploit anything here,
> their irq allocation is bound by the device support and they're
> entitled to make use of the full vector set of the device by virtue of
> having ownership of the device.  All the MSI-X allocated irqs are freed
> when the interrupt mode is changed or the device is released regardless.
> 
> The end result is also no different than if the user had not configured
> the vector when enabling MSI-X or configured it and later de-configured
> with a -1 eventfd.  The irq is allocated but not attached to a ctx.
> We're intentionally using this as a cache.
> 
> Also, as implemented here in v3, we might be freeing from the original
> allocation rather than a new, dynamically allocated irq.

Great point.

> 
> My thinking is that if we think there's a benefit to caching any
> allocated irqs, we should do so consistently.  We don't currently know
> if the irq was allocated now or previously.  Tracking that would add
> complexity for little benefit.  The user can get to the same end result
> of an allocated, unused irq in numerous way, the state itself is not
> erroneous, and is actually in support of caching irq allocations.
> Removing the de-allocation of a single vector further simplifies the
> code as there exists only one path where irqs are freed, ie.
> pci_free_irq_vectors().
> 
> So I'd lean towards removing vfio_msi_free_irq().

Thank you for your detailed analysis. I understand and agree.
I will remove vfio_msi_free_irq().

>  
>>> I'd probably also add a comment in the commit log about the theory
>>> behind not dynamically freeing irqs, ie. latency, reliability, and
>>> whatever else we used to justify it.  Thanks,  
>>
>> Sure. How about something like below to replace the final sentence of
>> the changelog:
>>
>> "When a guest disables an interrupt, user space (Qemu) does not
>> disable the interrupt but instead assigns it a different trigger. A
>> common flow is thus for the VFIO_DEVICE_SET_IRQS ioctl() to be 
>> used to assign a new eventfd to an already enabled interrupt. Freeing
>> and re-allocating an interrupt in this scenario would add unnecessary
>> latency as well as uncertainty since the re-allocation may fail. Do
>> not dynamically free interrupts when an interrupt is disabled, instead
>> support a subsequent re-enable to draw from the initial allocation when
>> possible. Leave freeing of interrupts to when MSI-X is disabled."
> 
> There are other means besides caching irqs that could achieve the above,
> for instance if a trigger is simply swapped from one eventfd to another,
> that all happens within vfio_msi_set_vector_signal() where we could
> hold the irq for the transition.
> 
> I think I might justify it as:
> 
> 	The PCI-MSIX API requires that some number of irqs are
> 	allocated for an initial set of vectors when enabling MSI-X on
> 	the device.  When dynamic MSIX allocation is not supported, the
> 	vector table, and thus the allocated irq set can only be resized
> 	by disabling and re-enabling MSIX with a different range.  In
> 	that case the irq allocation is essentially a cache for
> 	configuring vectors within the previously allocated vector
> 	range.  When dynamic MSIX allocation is supported, the API
> 	still requires some initial set of irqs to be allocated, but
> 	also supports allocating and freeing specific irq vectors both
> 	within and beyond the initially allocated range.
> 
> 	For consistency between modes, as well as to reduce latency and
> 	improve reliability of allocations, and also simplicity, this
> 	implementation only releases irqs via pci_free_irq_vectors()
> 	when either the interrupt mode changes or the device is
> 	released.
> 
> Does that cover the key points for someone that might want to revisit
> this decision later?  Thanks,

It does so clearly, yes. Thank you so much for taking the time to
write this. I will include it in the changelog.

Reinette