On Tue, Nov 30 2021 at 22:23, Thomas Gleixner wrote: > On Tue, Nov 30 2021 at 16:28, Jason Gunthorpe wrote: > > The real problem is where to store the MSI descriptors because the PCI > device has its own real PCI/MSI-X interrupts which means it still shares > the storage space. Bah. I confused myself by staring at the existing code instead of looking at how this NTB stuff actually works. So if I understand it correctly then the end result looks like this: 1) PCIe device (switchtec) The device has 4 MSI[X] interrupts: event, dma_rpc, message, doorbell. The event and dma_rpc interrupts are requested by the switchtec PCI driver itself. 2) Switchtec character device The switchtec PCI driver creates a character device which is exposed for device specific IOCTLs The device belongs to the switchtec_class device class. 3) Switchtec NTB device The ntb_hw_switchtec driver registers the switchtec_class class interface. So when #2 is registered with the driver core the switchtec class interface add_dev() function is invoked. That function creates a NTB device, requests the message and the doorbell interrupts which have been allocated by the underlying PCIe device driver (#1) and registers the NTB device with the NTB core. 4) The NTB core then tries to use the virtual MSI vectors which have been allocated by the switchtec driver in #1 and requires the msg write intercept to actually expose it to the peers. So we really can go and create a MSI irqdomain and stick the pointer into stdev->dev.irqdomain. The parent domain of this irqdomain is stdev->pdev.dev.irqdomain->parent which is either the irq remapping domain or the vector domain. Which is pretty much what I proposed as general facility for IMS/IDXD. I need to go back and polish that up on top of the current pile. Along with that have an irq chip implementation which exposes: static struct irq_chip ntb_chip = { .name = "ntb", .irq_ack = irq_chip_ack_parent, .irq_write_msi_msg = ntb_msi_write_msg, #ifdef CONFIG_SMP .irq_set_affinity = irq_chip_set_affinity_parent, #endif }; We just need some reasonable solution for the DMA/remap problem Jason mentioned vs. msi_desc::dev, but that wants to be cleaned up in any case for all the aliasing muck. Thanks, tglx