Re: [RFC PATCH 0/8] vfio/pci: Support dynamic allocation of MSI-X interrupts

Reinette Chatre <reinette.chatre@xxxxxxxxx> · Thu, 16 Mar 2023 16:38:29 -0700

Hi Alex,

On 3/16/2023 2:56 PM, Alex Williamson wrote:
> On Wed, 15 Mar 2023 13:59:20 -0700 Reinette Chatre
> <reinette.chatre@xxxxxxxxx> wrote:
> 

...

>> == Why is this an RFC ? ==
>> 
>> vfio support for dynamic MSI-X needs to work with existing user
>> space as well as upcoming user space that takes advantage of this
>> feature. I would appreciate guidance on the expectations and
>> requirements surrounding error handling when considering existing
>> user space.
>> 
>> For example, consider the following scenario: Start: Consider a
>> passthrough device that supports 10 MSI-X interrupts (0 to 9) and
>> existing Qemu allocated interrupts 0 to 4.
>> 
>> Scenario: Qemu (hypothetically) attempts to associate a new action
>> to interrupts 0 to 7. Early checking of this range in 
>> vfio_set_irqs_validate_and_prepare() will pass since it is a valid
>> range for the device. What happens after the early checking is
>> considered next:
>> 
>> Current behavior (before this series): Since the provided range, 0
>> - 7, exceeds the allocated range, no action will be taken on
>> existing allocated interrupts 0 - 4 and the Qemu request will fail
>> without making any state changes.
> 
> I must be missing something, it was described correctly earlier that 
> QEMU will disable MSI-X and re-enable with a new vector count.  Not 
> only does QEMU not really have a way to fail this change, pretty
> much nothing would work if we did.

Thank you very much for confirming Qemu behavior. 

One of my goals is to ensure that these kernel changes do not break
existing user space in any way. The only area I could find where
existing user space may encounter new behavior is if user space makes
the (pre dynamic MSI-X) mistake of attempting to associate triggers
to interrupts that have not been allocated. Thank you for confirming
that this is not a valid scenario for Qemu.

> What happens in this case is that the QEMU vfio-pci driver gets a 
> vector_use callback for one of these new vectors {5,6,7}, 
> vfio_msix_vector_do_use() checks that's greater than we have
> enabled, disables MSI-X, and re-enables it for the new vector count.
> Worst case we'll do that 3 times if the vectors are presented in
> ascending order.

Indeed. While testing this work I modified Qemu's vfio_msix_vector_do_use() to
consider VFIO_IRQ_INFO_NORESIZE. I saw the vector_use callback arriving
for each new vector and having Qemu just associate a new action with the
new vector (call vfio_set_irq_signaling()) instead of disabling MSI-X followed
by enabling all vectors worked well with the changes I present here. A single
interrupt was dynamically allocated and enabled without impacting others.

> Based on above, there really can never be an error if we expect the 
> device to work, so I think there's a misread of the current status. 
> Dynamic MSI-X support should simply reduce the disruption and chance
> of lost interrupts at the device, but the points where we risk that
> the host cannot provide the configuration we need are the same.

Thank you very much Alex. In this case, please do consider this
submission as a submission for inclusion. I'd be happy to resubmit
without the "RFC" prefix if that is preferred.

Reinette