Hi Alex, On 4/19/2023 2:38 PM, Alex Williamson wrote: > On Wed, 19 Apr 2023 11:13:29 -0700 > Reinette Chatre <reinette.chatre@xxxxxxxxx> wrote: >> On 4/18/2023 3:38 PM, Alex Williamson wrote: >>> On Tue, 18 Apr 2023 10:29:20 -0700 >>> Reinette Chatre <reinette.chatre@xxxxxxxxx> wrote: >>> >>>> Recently introduced pci_msix_alloc_irq_at() and pci_msix_free_irq() >>>> enables an individual MSI-X interrupt to be allocated and freed after >>>> MSI-X enabling. >>>> >>>> Use dynamic MSI-X (if supported by the device) to allocate an interrupt >>>> after MSI-X is enabled. An MSI-X interrupt is dynamically allocated at >>>> the time a valid eventfd is assigned. This is different behavior from >>>> a range provided during MSI-X enabling where interrupts are allocated >>>> for the entire range whether a valid eventfd is provided for each >>>> interrupt or not. >>>> >>>> Do not dynamically free interrupts, leave that to when MSI-X is >>>> disabled. >>> >>> But we do, sometimes, even if essentially only on the error path. Is >>> that worthwhile? It seems like we could entirely remove >>> vfio_msi_free_irq() and rely only on pci_free_irq_vectors() on MSI/X >>> teardown. >> >> Yes, it is only on the error path where dynamic MSI-X interrupts are >> removed. I do not know how to determine if it is worthwhile. On the >> kernel side failure seems unlikely since it would mean memory cannot >> be allocated or request_irq() failed. In these cases it may not be >> worthwhile since user space may try again and having the interrupt >> already allocated would be helpful. The remaining error seems to be >> if user space provided an invalid eventfd. An allocation in response >> to wrong user input is a concern to me. Should we consider >> buggy/malicious users? I am uncertain here so would defer to your >> guidance. > > I don't really see that a malicious user can exploit anything here, > their irq allocation is bound by the device support and they're > entitled to make use of the full vector set of the device by virtue of > having ownership of the device. All the MSI-X allocated irqs are freed > when the interrupt mode is changed or the device is released regardless. > > The end result is also no different than if the user had not configured > the vector when enabling MSI-X or configured it and later de-configured > with a -1 eventfd. The irq is allocated but not attached to a ctx. > We're intentionally using this as a cache. > > Also, as implemented here in v3, we might be freeing from the original > allocation rather than a new, dynamically allocated irq. Great point. > > My thinking is that if we think there's a benefit to caching any > allocated irqs, we should do so consistently. We don't currently know > if the irq was allocated now or previously. Tracking that would add > complexity for little benefit. The user can get to the same end result > of an allocated, unused irq in numerous way, the state itself is not > erroneous, and is actually in support of caching irq allocations. > Removing the de-allocation of a single vector further simplifies the > code as there exists only one path where irqs are freed, ie. > pci_free_irq_vectors(). > > So I'd lean towards removing vfio_msi_free_irq(). Thank you for your detailed analysis. I understand and agree. I will remove vfio_msi_free_irq(). > >>> I'd probably also add a comment in the commit log about the theory >>> behind not dynamically freeing irqs, ie. latency, reliability, and >>> whatever else we used to justify it. Thanks, >> >> Sure. How about something like below to replace the final sentence of >> the changelog: >> >> "When a guest disables an interrupt, user space (Qemu) does not >> disable the interrupt but instead assigns it a different trigger. A >> common flow is thus for the VFIO_DEVICE_SET_IRQS ioctl() to be >> used to assign a new eventfd to an already enabled interrupt. Freeing >> and re-allocating an interrupt in this scenario would add unnecessary >> latency as well as uncertainty since the re-allocation may fail. Do >> not dynamically free interrupts when an interrupt is disabled, instead >> support a subsequent re-enable to draw from the initial allocation when >> possible. Leave freeing of interrupts to when MSI-X is disabled." > > There are other means besides caching irqs that could achieve the above, > for instance if a trigger is simply swapped from one eventfd to another, > that all happens within vfio_msi_set_vector_signal() where we could > hold the irq for the transition. > > I think I might justify it as: > > The PCI-MSIX API requires that some number of irqs are > allocated for an initial set of vectors when enabling MSI-X on > the device. When dynamic MSIX allocation is not supported, the > vector table, and thus the allocated irq set can only be resized > by disabling and re-enabling MSIX with a different range. In > that case the irq allocation is essentially a cache for > configuring vectors within the previously allocated vector > range. When dynamic MSIX allocation is supported, the API > still requires some initial set of irqs to be allocated, but > also supports allocating and freeing specific irq vectors both > within and beyond the initially allocated range. > > For consistency between modes, as well as to reduce latency and > improve reliability of allocations, and also simplicity, this > implementation only releases irqs via pci_free_irq_vectors() > when either the interrupt mode changes or the device is > released. > > Does that cover the key points for someone that might want to revisit > this decision later? Thanks, It does so clearly, yes. Thank you so much for taking the time to write this. I will include it in the changelog. Reinette