On Tue, 18 Apr 2023 10:29:20 -0700 Reinette Chatre <reinette.chatre@xxxxxxxxx> wrote: > Recently introduced pci_msix_alloc_irq_at() and pci_msix_free_irq() > enables an individual MSI-X interrupt to be allocated and freed after > MSI-X enabling. > > Use dynamic MSI-X (if supported by the device) to allocate an interrupt > after MSI-X is enabled. An MSI-X interrupt is dynamically allocated at > the time a valid eventfd is assigned. This is different behavior from > a range provided during MSI-X enabling where interrupts are allocated > for the entire range whether a valid eventfd is provided for each > interrupt or not. > > Do not dynamically free interrupts, leave that to when MSI-X is > disabled. But we do, sometimes, even if essentially only on the error path. Is that worthwhile? It seems like we could entirely remove vfio_msi_free_irq() and rely only on pci_free_irq_vectors() on MSI/X teardown. I'd probably also add a comment in the commit log about the theory behind not dynamically freeing irqs, ie. latency, reliability, and whatever else we used to justify it. Thanks, Alex > Signed-off-by: Reinette Chatre <reinette.chatre@xxxxxxxxx> > Link: https://lore.kernel.org/lkml/20230403211841.0e206b67.alex.williamson@xxxxxxxxxx/ > --- > The get_cached_msi_msg()/pci_write_msi_msg() behavior is kept > similar to the scenario when MSI-X is enabled with triggers > provided for new interrupts. get_cached_msi_msg()/pci_write_msi_msg() > follows for interrupts recently allocated with pci_msix_alloc_irq_at() > just like get_cached_msi_msg()/pci_write_msi_msg() is done for > interrupts recently allocated with pci_alloc_irq_vectors(). > > Changes since V2: > - Move vfio_irq_ctx_free() to earlier in series to support > earlier usage. (Alex) > - Use consistent terms in changelog: MSI-x changed to MSI-X. > - Make dynamic interrupt context creation generic across all > MSI/MSI-X interrupts. This resulted in code moving to earlier > in series as part of xarray introduction patch. (Alex) > - Remove the local allow_dyn_alloc and direct calling of > pci_msix_can_alloc_dyn(), use the new vdev->has_dyn_msix > introduced earlier instead. (Alex) > - Stop tracking new allocations (remove "new_ctx"). (Alex) > - Introduce new wrapper that returns Linux interrupt number or > dynamically allocate a new interrupt. Wrapper can be used for > all interrupt cases. (Alex) > - Only free dynamic MSI-X interrupts on MSI-X teardown. (Alex) > > Changes since RFC V1: > - Add pointer to interrupt context as function parameter to > vfio_irq_ctx_free(). (Alex) > - Initialize new_ctx to false. (Dan Carpenter) > - Only support dynamic allocation if device supports it. (Alex) > > drivers/vfio/pci/vfio_pci_intrs.c | 73 +++++++++++++++++++++++++++---- > 1 file changed, 65 insertions(+), 8 deletions(-) > > diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c > index bdda7f46c2be..c1a3e224c867 100644 > --- a/drivers/vfio/pci/vfio_pci_intrs.c > +++ b/drivers/vfio/pci/vfio_pci_intrs.c > @@ -372,27 +372,74 @@ static int vfio_msi_enable(struct vfio_pci_core_device *vdev, int nvec, bool msi > return 0; > } > > +/* > + * Return Linux IRQ number of an MSI or MSI-X device interrupt vector. > + * If a Linux IRQ number is not available then a new interrupt will be > + * allocated if dynamic MSI-X is supported. > + */ > +static int vfio_msi_alloc_irq(struct vfio_pci_core_device *vdev, > + unsigned int vector, bool msix) > +{ > + struct pci_dev *pdev = vdev->pdev; > + struct msi_map map; > + int irq; > + u16 cmd; > + > + irq = pci_irq_vector(pdev, vector); > + if (irq > 0 || !msix || !vdev->has_dyn_msix) > + return irq; > + > + cmd = vfio_pci_memory_lock_and_enable(vdev); > + map = pci_msix_alloc_irq_at(pdev, vector, NULL); > + vfio_pci_memory_unlock_and_restore(vdev, cmd); > + > + return map.index < 0 ? map.index : map.virq; > +} > + > +/* > + * Free interrupt if it can be re-allocated dynamically (while MSI-X > + * is enabled). > + */ > +static void vfio_msi_free_irq(struct vfio_pci_core_device *vdev, > + unsigned int vector, bool msix) > +{ > + struct pci_dev *pdev = vdev->pdev; > + struct msi_map map; > + int irq; > + u16 cmd; > + > + if (!msix || !vdev->has_dyn_msix) > + return; > + > + irq = pci_irq_vector(pdev, vector); > + map = (struct msi_map) { .index = vector, .virq = irq }; > + > + if (WARN_ON(irq < 0)) > + return; > + > + cmd = vfio_pci_memory_lock_and_enable(vdev); > + pci_msix_free_irq(pdev, map); > + vfio_pci_memory_unlock_and_restore(vdev, cmd); > +} > + > static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev, > unsigned int vector, int fd, bool msix) > { > struct pci_dev *pdev = vdev->pdev; > struct vfio_pci_irq_ctx *ctx; > struct eventfd_ctx *trigger; > - int irq, ret; > + int irq = -EINVAL, ret; > u16 cmd; > > - irq = pci_irq_vector(pdev, vector); > - if (irq < 0) > - return -EINVAL; > - > ctx = vfio_irq_ctx_get(vdev, vector); > > if (ctx) { > irq_bypass_unregister_producer(&ctx->producer); > - > + irq = pci_irq_vector(pdev, vector); > cmd = vfio_pci_memory_lock_and_enable(vdev); > free_irq(irq, ctx->trigger); > vfio_pci_memory_unlock_and_restore(vdev, cmd); > + /* Interrupt stays allocated, will be freed at MSI-X disable. */ > kfree(ctx->name); > eventfd_ctx_put(ctx->trigger); > vfio_irq_ctx_free(vdev, ctx, vector); > @@ -401,9 +448,17 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev, > if (fd < 0) > return 0; > > + if (irq == -EINVAL) { > + irq = vfio_msi_alloc_irq(vdev, vector, msix); > + if (irq < 0) > + return irq; > + } > + > ctx = vfio_irq_ctx_alloc(vdev, vector); > - if (!ctx) > - return -ENOMEM; > + if (!ctx) { > + ret = -ENOMEM; > + goto out_free_irq; > + } > > ctx->name = kasprintf(GFP_KERNEL_ACCOUNT, "vfio-msi%s[%d](%s)", > msix ? "x" : "", vector, pci_name(pdev)); > @@ -456,6 +511,8 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev, > kfree(ctx->name); > out_free_ctx: > vfio_irq_ctx_free(vdev, ctx, vector); > +out_free_irq: > + vfio_msi_free_irq(vdev, vector, msix); > return ret; > } >