Hi Alex, On 4/3/2023 8:18 PM, Alex Williamson wrote: > On Mon, 3 Apr 2023 15:50:54 -0700 > Reinette Chatre <reinette.chatre@xxxxxxxxx> wrote: >> On 4/3/2023 1:22 PM, Alex Williamson wrote: >>> On Mon, 3 Apr 2023 10:31:23 -0700 >>> Reinette Chatre <reinette.chatre@xxxxxxxxx> wrote: >>>> On 3/31/2023 3:24 PM, Alex Williamson wrote: >>>>> On Fri, 31 Mar 2023 10:49:16 -0700 >>>>> Reinette Chatre <reinette.chatre@xxxxxxxxx> wrote: >>>>>> On 3/30/2023 3:42 PM, Alex Williamson wrote: >>>>>>> On Thu, 30 Mar 2023 16:40:50 -0600 >>>>>>> Alex Williamson <alex.williamson@xxxxxxxxxx> wrote: >>>>>>> >>>>>>>> On Tue, 28 Mar 2023 14:53:34 -0700 >>>>>>>> Reinette Chatre <reinette.chatre@xxxxxxxxx> wrote: >>>>>>>> ... >>> If the goal is to allow the user to swap one eventfd for another, where >>> the result will always be the new eventfd on success or the old eventfd >>> on error, I don't see that this code does that, or that we've ever >>> attempted to make such a guarantee. If the ioctl errors, I think the >>> eventfds are generally deconfigured. We certainly have the unwind code >>> that we discussed earlier that deconfigures all the vectors previously >>> touched in the loop (which seems to be another path where we could >>> de-allocate from the set of initial ctxs). >> >> Thank you for your patience in hearing and addressing my concerns. I plan >> to remove new_ctx in the next version. >> >>>>> devices supporting vdev->has_dyn_msix only ever have active contexts >>>>> allocated? Thanks, >>>> >>>> What do you see as an "active context"? A policy that is currently enforced >>>> is that an allocated context always has an allocated interrupt associated >>>> with it. I do not see how this could be expanded to also require an >>>> enabled interrupt because interrupt enabling requires a trigger that >>>> may not be available. >>> >>> A context is essentially meant to track a trigger, ie. an eventfd >>> provided by the user. In the static case all the irqs are necessarily >>> pre-allocated, therefore we had no reason to consider a dynamic array >>> for the contexts. However, a given context is really only "active" if >>> it has a trigger, otherwise it's just a placeholder. When the >>> placeholder is filled by an eventfd, the pre-allocated irq is enabled. >> >> I see. >> >>> >>> This proposal seems to be a hybrid approach, pre-allocating some >>> initial set of irqs and contexts and expecting the differentiation to >>> occur only when new vectors are added, though we have some disagreement >>> about this per above. Unfortunately I don't see an API to enable MSI-X >>> without some vectors, so some pre-allocation of irqs seems to be >>> required regardless. >> >> Right. pci_alloc_irq_vectors() or equivalent continues to be needed to >> enable MSI-X. Even so, it does seem possible (within vfio_msi_enable()) >> to just allocate one vector using pci_alloc_irq_vectors() >> and then immediately free it using pci_msix_free_irq(). What do you think? > > QEMU does something similar but I think it can really only be described > as a hack. In this case I think we can work with them being allocated > since that's essentially the static path. ok. In this case I understand the hybrid approach to be required. Without something (a hack) like this I am not able to see how an "active context" policy can be enforced though. Interrupts allocated during MSI-X enabling may not have eventfd associated and thus cannot adhere to an "active context" policy. I understand from earlier comments that we do not want to track where contexts are allocated so I can only see a way to enforce a policy that a context has an allocated interrupt, but not an enabled interrupt. >> If I understand correctly this can be done without allocating any context >> and leave MSI-X enabled without any interrupts allocated. This could be a >> way to accomplish the "active context" policy for dynamic allocation. >> This is not a policy that can be applied broadly to interrupt contexts though >> because MSI and non-dynamic MSI-X could still have contexts with allocated >> interrupts without eventfd. > > I think we could come up with wrappers that handle all cases, for > example: > > int vfio_pci_alloc_irq(struct vfio_pci_core_device *vdev, > unsigned int vector, int irq_type) > { > struct pci_dev *pdev = vdev->pdev; > struct msi_map map; > int irq; > > if (irq_type == VFIO_PCI_INTX_IRQ_INDEX) > return pdev->irq ?: -EINVAL; > > irq = pci_irq_vector(pdev, vector); > if (irq > 0 || irq_type == VFIO_PCI_MSI_IRQ_INDEX || > !vdev->has_dyn_msix) > return irq; > > map = pci_msix_alloc_irq_at(pdev, vector, NULL); > > return map.index; > } > > void vfio_pci_free_irq(struct vfio_pci_core_device *vdev, > unsigned in vector, int irq_type) > { > struct msi_map map; > int irq; > > if (irq_type != VFIO_PCI_INTX_MSIX_INDEX || > !vdev->has_dyn_msix) > return; > > irq = pci_irq_vector(pdev, vector); > map = { .index = vector, .virq = irq }; > > if (WARN_ON(irq < 0)) > return; > > pci_msix_free_irq(pdev, msix_map); > } Thank you very much for taking the time to write this out. I am not able to see where vfio_pci_alloc_irq()/vfio_pci_free_irq() would be called for an INTx interrupt. Is the INTx handling there for robustness or am I missing how it should be used for INTx interrupts? > At that point, maybe we'd check whether it makes sense to embed the irq > alloc/free within the ctx alloc/free. I think doing so would be the right thing to do since it helps to enforce the policy that interrupts and contexts are allocated together. I think this can be done when switching around the initialization within vfio_msi_set_vector_signal(). I need to look into this more. >>> But if non-active contexts were only placeholders in the pre-dynamic >>> world and we now manage them via a dynamic array, why is there any >>> pre-allocation of contexts without knowing the nature of the eventfd to >>> fill it? We could have more commonality between cases if contexts are >>> always dynamically allocated, which might simplify differentiation of >>> the has_dyn_msix cases largely to wrappers allocating and freeing irqs. >>> Thanks, >> >> Thank you very much for your guidance. I will digest this some more and >> see how wrappers could be used. In the mean time while trying to think how >> to unify this code I do think there is an issue in this patch in that >> the get_cached_msi_msg()/pci_write_msi_msg() >> should not be in an else branch. >> >> Specifically, I think it needs to be: >> if (msix) { >> if (irq == -EINVAL) { >> /* dynamically allocate interrupt */ >> } >> get_cached_msi_msg(irq, &msg); >> pci_write_msi_msg(irq, &msg); >> } > > Yes, that's looked wrong to me all along, I think that resolves it. Thank you very much. Reinette