On 2021-11-29 1:51 p.m., Thomas Gleixner wrote: > Logan, > > On Mon, Nov 29 2021 at 11:21, Logan Gunthorpe wrote: >> On 2021-11-26 6:23 p.m., Thomas Gleixner wrote: >>> Replace the about to vanish iterators, make use of the filtering and take >>> the descriptor lock around the iteration. >> >> This patch looks good to me: >> >> Reviewed-by: Logan Gunthorpe <logang@xxxxxxxxxxxx> > > thanks for having a look at this. While I have your attention, I have a > question related to NTB. > > The switchtec driver is the only one which uses PCI_IRQ_VIRTUAL in order > to allocate non-hardware backed MSI-X descriptors. > > AFAIU these descriptors are not MSI-X descriptors in the regular sense > of PCI/MSI-X. They are allocated via the PCI/MSI mechanism but their > usage is somewhere in NTB which has nothing to do with the way how the > real MSI-X interrupts of a device work which explains why we have those > pci.msi_attrib.is_virtual checks all over the place. > > I assume that there are other variants feeding into NTB which can handle > that without this PCI_IRQ_VIRTUAL quirk, but TBH, I got completely lost > in that code. > > Could you please shed some light on the larger picture of this? Yes, of course. I'll try to explain: The NTB code here is trying to create an MSI interrupt that is not triggered by the PCI device itself but from a peer behind the Non-Transparent Bridge (or, more carefully: from the CPU's perspective the interrupt will come from the PCI device, but nothing in the PCI device's firmware or hardware will have triggered the interrupt). In most cases, the NTB code needs more interrupts than the hardware actually provides for in its MSI-X table. That's what PCI_IRQ_VIRTUAL is for: it allows the driver to request more interrupts than the hardware advertises (ie. pci_msix_vec_count()). These extra interrupts are created, but get flagged with msi_attrib.is_virtual which ensures functions that program the MSI-X table don't try to write past the end of the hardware's table. The NTB code in drivers/ntb/msi.c uses these virtual MSI-X interrupts. (Or rather it can use any interrupt: it doesn't care whether its virtual or not, it would be fine if it is just a spare interrupt in hardware, but in practice, it will usually be a virtual one). The code uses these interrupts by setting up a memory window in the bridge to cover the MMIO addresses of MSI-X interrupts. It communicates the offsets of the interrupts (and the MSI message data) to the peer so that the peer can trigger the interrupt simply by writing the message data to its side of the memory window. (In the code: ntbm_msi_request_threaded_irq() is called to request and interrupt which fills in the ntb_msi_desc with the offset and data, which is transferred to the peer which would then use ntb_msi_peer_trigger() to trigger the interrupt.) Existing NTB hardware does already have what's called a doorbell which provides the same functionally as the above technique. However, existing hardware implementations of doorbells have significant latency and thus slow down performance substantially. Implementing the MSI interrupts as described above increased the performance of ntb_transport by more than three times[1]. There aren't really other "variants". In theory, IDT hardware would also require the same quirk but the drivers in the tree aren't quite up to snuff and don't even support ntb_transport (so nobody has added support). Intel and AMD drivers could probably do this as well (provided they have extra memory windows) but I don't know that anyone has tried. Let me know if anything is still unclear or you have further questions. You can also read the last posting of the patch series[2] if you'd like more information. Logan [1] 2b0569b3b7e6 ("NTB: Add MSI interrupt support to ntb_transport") [2] https://lore.kernel.org/all/20190523223100.5526-1-logang@xxxxxxxxxxxx/T/#u