> From: Chatre, Reinette <reinette.chatre@xxxxxxxxx> > Sent: Saturday, April 29, 2023 2:35 AM > > Hi Kevin, > > On 4/27/2023 11:50 PM, Tian, Kevin wrote: > >> From: Chatre, Reinette <reinette.chatre@xxxxxxxxx> > >> Sent: Friday, April 28, 2023 1:36 AM > >> > >> pci_msix_alloc_irq_at() enables an individual MSI-X interrupt to be > >> allocated after MSI-X enabling. > >> > >> Use dynamic MSI-X (if supported by the device) to allocate an interrupt > >> after MSI-X is enabled. An MSI-X interrupt is dynamically allocated at > >> the time a valid eventfd is assigned. This is different behavior from > >> a range provided during MSI-X enabling where interrupts are allocated > >> for the entire range whether a valid eventfd is provided for each > >> interrupt or not. > >> > >> The PCI-MSIX API requires that some number of irqs are allocated for > >> an initial set of vectors when enabling MSI-X on the device. When > >> dynamic MSIX allocation is not supported, the vector table, and thus > >> the allocated irq set can only be resized by disabling and re-enabling > >> MSI-X with a different range. In that case the irq allocation is > >> essentially a cache for configuring vectors within the previously > >> allocated vector range. When dynamic MSI-X allocation is supported, > >> the API still requires some initial set of irqs to be allocated, but > >> also supports allocating and freeing specific irq vectors both > >> within and beyond the initially allocated range. > >> > >> For consistency between modes, as well as to reduce latency and improve > >> reliability of allocations, and also simplicity, this implementation > >> only releases irqs via pci_free_irq_vectors() when either the interrupt > >> mode changes or the device is released. > > > > It improves the reliability of allocations from the calling device p.o.v. > > > > But system-wide this is not efficient use of irqs and not releasing them > > timely may affect the reliability of allocations for other devices. > > Could you please elaborate how other devices may be impacted? the more this devices reserves the less remains for others, e.g. irte entries. > > > Should this behavior be something configurable? > > This is not clear to me and I look to you for guidance here. From practical > side it looks like configuration via module parameters is supported but > whether it should be done is not clear to me. > > When considering this we need to think about what the user may expect > when > turning on/off the configuration. For example, MSI-X continues to allocate a > range of interrupts during enabling. These have always been treated as a > "cache" (interrupts remain allocated, whether they have an associated > trigger > or not). If there is new configurable behavior, do you expect that the > driver needs to distinguish between the original "cache" that the user is > used to and the new dynamic allocations? That is, should a dynamic MSI-X > capable device always free interrupts when user space removes an eventfd > or should only interrupts that were allocated dynamically be freed > dynamically? That looks tricky. Probably that is why Alex suggested doing this simple scheme and it is on par with the old logic anyway. So I'll withdraw this comment. > > >> +/* > >> + * Return Linux IRQ number of an MSI or MSI-X device interrupt vector. > >> + * If a Linux IRQ number is not available then a new interrupt will be > >> + * allocated if dynamic MSI-X is supported. > >> + */ > >> +static int vfio_msi_alloc_irq(struct vfio_pci_core_device *vdev, > >> + unsigned int vector, bool msix) > >> +{ > >> + struct pci_dev *pdev = vdev->pdev; > >> + struct msi_map map; > >> + int irq; > >> + u16 cmd; > >> + > >> + irq = pci_irq_vector(pdev, vector); > >> + if (irq > 0 || !msix || !vdev->has_dyn_msix) > >> + return irq; > > > > if (irq >= 0 || ...) > > > > I am not sure about this request because pci_irq_vector() cannot return 0. > The Linux interrupt number will be > 0 on success. 0 means "not found" > (see msi_get_virq()), which is translated to -EINVAL by pci_irq_vector(). > There is a subtle difference between the description and the code of pci_irq_vector(). /** * pci_irq_vector() - Get Linux IRQ number of a device interrupt vector * @dev: the PCI device to operate on * @nr: device-relative interrupt vector index (0-based); has different * meanings, depending on interrupt mode: * * * MSI-X the index in the MSI-X vector table * * MSI the index of the enabled MSI vectors * * INTx must be 0 * * Return: the Linux IRQ number, or -EINVAL if @nr is out of range */ >From above '0' is a valid irq number. then in following code: irq = msi_get_virq(&dev->dev, nr); return irq ? irq : -EINVAL; '0' is obviously invalid for msi. I didn't realize the msi part when reading the patch. It left me in confusion that '0' is unhandled as here we only check ">0" while in other places "-EINVAL" is checked. Not big matter but it sounds slightly clearer to me to follow the description of pci_irq_vector() instead of its internal detail.