Jason, On Mon, Dec 06 2021 at 10:43, Jason Gunthorpe wrote: > On Sun, Dec 05, 2021 at 03:16:40PM +0100, Thomas Gleixner wrote: >> > That's not really a good idea because dev->irqdomain is a generic >> > mechanism and not restricted to the PCI use case. Special casing it for >> > PCI is just wrong. Special casing it for all use cases just to please >> > PCI is equally wrong. There is a world outside of PCI and x86. >> >> That argument is actually only partially correct. > > I'm not sure I understood your reply? I think we are both agreeing > that dev->irqdomain wants to be a generic mechanism? Yes. I managed to confuse myself there by being too paranoid about how to distinguish things on platforms which need to support both ways, i.e. x86 when XEN is enabled. > I'd say that today we've special cased it to handle PCI. IMHO that is > exactly what pci_msi_create_irq_domain() is doing - it replaces the > chip ops with ops that can *ONLY* do PCI MSI and so dev->irqdomain > becomes PCI only and non-generic. Right. See above. That's why I went back to my notes, did some more research ... >> 2) Guest support is strictly opt-in >> >> The underlying architecture/subarchitecture specific irqdomain has >> to detect at setup time (eventually early boot), whether the >> underlying hypervisor supports it. >> >> The only reasonable way to support that is the availability of >> interrupt remapping via vIOMMU, as we discussed before. > > This is talking about IMS specifically because of the legacy issue > where the MSI addr/data pair inside a guest is often completely fake? This is about IMS, right. PCI/MSI[x] is handled today because the writes to the MSI/MSI-X message store can be trapped. >> That does not work in all cases due to architecture and host >> controller constraints, so we might end up with: >> >> VECTOR -> IOMMU -> SHIM -> PCI/[MSI/MSI-X/IMS] domains > > OK - I dont' know enough about the architecture/controller details to > imagine what SHIM is, but if it allows keeping the PCI code as purely > PCI code, then great It's today part of the arch/subarch specific PCI/MSI domain to deal with quirks above the IOMMU level. As we can't proliferate that into the new endpoint domain, that needs to be done as a shim layer in between which has no real other functionality than applying the quirks. Yes, it's all pretty. Welcome to my wonderful world. >> - The irqchip callbacks which can be implemented by these top >> level domains are going to be restricted. > > OK - I think it is great that the driver will see a special ops struct > that is 'ops for device's MSI addr/data pair storage'. It makes it > really clear what it is It will need some more than that, e.g. mask/unmask and as we discussed quite some time ago something like the irq_buslock/unlock pair, so you can handle updates to the state from thread context via a command queue (IIRC). >> - For the irqchip callbacks which are allowed/required the rules >> vs. following down the hierarchy need to be defined and >> enforced. > > The driver should be the ultimate origin of the interrupt so it is > always end-point in the hierarchy, opposite the CPU? > > I would hope the driver doesn't have an exposure to hierarchy? No. > So we have a new concept: 'device MSI storage ops' > > Put them along with the xarray holding the msi_descs and you've got my > msi_table :) Hehe. >> Sorry Jason, no tables for you. :) > > How does the driver select with 'device MSI storage ops' it is > requesting a MSI for ? Via some cookie, reference whatever as discussed in the other mail. We'll bikeshed the naming once I get there :) >> 1) I'm going to post part 1-3 of the series once more with the fallout >> and review comments addressed. > > OK, I didn't see anything in there that was making anything harder in > this direction It's helping to keep the existing stuff including the !irqdomain parts sufficiently self contained so I can actually change the inner workings of msi domains without going back to any of these places (hopefully). >> 5) Implement an IMS user. >> >> The obvious candidate which should be halfways accessible is the >> ath11 PCI driver which falls into that category. > > Aiiee: Yes. > drivers/net/wireless/ath/ath11k/pci.c: ab_pci->msi_ep_base_data = msi_desc->msg.data; That's only one part of it. Look how the address is retrieved. Thanks, tglx