On Tue, Nov 30 2021 at 16:28, Jason Gunthorpe wrote: > On Tue, Nov 30, 2021 at 08:48:03PM +0100, Thomas Gleixner wrote: >> On Tue, Nov 30 2021 at 12:21, Logan Gunthorpe wrote: >> > On 2021-11-29 5:29 p.m., Thomas Gleixner wrote: >> >> I'm way too tired to come up with a proper solution for that, but that >> >> PCI_IRQ_VIRTUAL has to die ASAP. >> > >> > I'm willing to volunteer a bit of my time to clean this up, but I'd need >> > a bit more direction on what a proper solution would look like. The MSI >> > domain code is far from well documented nor is it easy to understand. >> >> Fair enough. I'm struggling with finding time to document that properly. >> >> I've not yet made my mind up what the best way forward for this is, but >> I have a few ideas which I want to explore deeper. > > I may have lost the plot in all of these patches, but I thought the > direction was moving toward the msi_domain_alloc_irqs() approach IDXD > demo'd here: > > https://lore.kernel.org/kvm/162164243591.261970.3439987543338120797.stgit@xxxxxxxxxxxxxxxxxxxxxxxxxx/ Yes, that's something I have in mind. Though this patch series would not be really required to support IDXD, it's making stuff simpler. The main point of this is to cure the VFIO issue of tearing down MSI-X of passed through devices in order to expand the MSI-X vector space on the host. > I'd expect all the descriptor handling code in drivers/ntb/msi.c to > get wrapped in an irq_chip instead of inserting a single-use callback > to the pci core code's implementation: > > void __pci_write_msi_msg(struct msi_desc *entry, struct msi_msg *msg) > { > if (entry->write_msi_msg) > entry->write_msi_msg(entry, entry->write_msi_msg_data); > > If this doesn't become an irq_chip what other way is there to properly > program the addr/data pair as drivers/ntb/msi.c is doing? That's not the question. This surely will be a separate irq chip and a separate irqdomain. The real problem is where to store the MSI descriptors because the PCI device has its own real PCI/MSI-X interrupts which means it still shares the storage space. IDXD is different in that regard because IDXD creates subdevices which have their own struct device and they just store the MSI descriptors in the msi data of that device. I'm currently tending to partition the index space in the xarray: 0x00000000 - 0x0000ffff PCI/MSI-X 0x00010000 - 0x0001ffff NTB which is feasible now with the range modifications and way simpler to do with xarray than with the linked list. Thanks, tglx