On Fri, 06 Sep 2024 12:11:36 +0200 Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote: > On Thu, Sep 05 2024 at 12:23, Jonathan Cameron wrote: > >> Look at how the PCI device manages interrupts with the per device MSI > >> mechanism. Forget doing this with either one of the legacy mechanisms. > >> > >> 1) It creates a hierarchical interrupt domain and gets the required > >> resources from the provided parent domain. The PCI side does not > >> care whether this is x86 or arm64 and it neither cares whether the > >> parent domain does remapping or not. The only way it cares is about > >> the features supported by the different parent domains (think > >> multi-MSI). > >> > >> 2) Driver side allocations go through the per device domain > >> > >> That AUXbus is not any different. When the CPMU driver binds it wants to > >> allocate interrupts. So instead of having a convoluted construct > >> reaching into the parent PCI device, you simply can do: > >> > >> 1) Let the cxl_pci driver create a MSI parent domain and set that in > >> the subdevice::msi::irqdomain pointer. > >> > >> 2) Provide cxl_aux_bus_create_irqdomain() which allows the CPMU device > >> driver to create a per device interrupt domain. > >> > >> 3) Let the CPMU driver create its per device interrupt domain with > >> the provided parent domain > >> > >> 4) Let the CPMU driver allocate its MSI-X interrupts through the per > >> device domain > >> > >> Now on the cxl_pci side the AUXbus parent interrupt domain allocation > >> function does: > >> > >> if (!pci_msix_enabled(pdev)) > >> return pci_msix_enable_and_alloc_irq_at(pdev, ....); > >> else > >> return pci_msix_alloc_irq_at(pdev, ....); > > Ignore the above. Brainfart. > > > I'm struggling to follow this suggestion > > Would you expect the cxl_pci MSI parent domain to set it's parent as > > msi_domain = irq_domain_create_hierarchy(dev_get_msi_domain(&pdev->dev), > > IRQ_DOMAIN_FLAG_MSI_PARENT, > > ... > > which seems to cause a problem with deadlocks in __irq_domain_alloc_irqs() > > or create a separate domain structure and provide callbacks that reach into > > the parent domain as necessary? > > > > Or do I have this entirely wrong? I'm struggling to relate what existing > > code like PCI does to what I need to do here. > > dev_get_msi_domain(&pdev->dev) is a nightmare due to the 4 different > models we have: > > - Legacy has no domain > > - Non-hierarchical domain > > - Hierarchical domain v1 > > That's the global PCI/MSI domain > > - Hierarchical domain v2 > > That's the underlying MSI_PARENT domain, which is on x86 > either the interrupt remap unit or the vector domain. On arm64 > it's the ITS domain. > > See e.g. pci_msi_domain_supports() which handles the mess. > > Now this proposal will only work on hierarchical domain v2 because that > can do the post enable allocations on MSI-X. Let's put a potential > solution for MSI aside for now to avoid complexity explosion. > > So there are two ways to solve this: > > 1) Implement the CXL MSI parent domain as disjunct domain > > 2) Teach the V2 per device MSI-X domain to be a parent domain > > #1 Looks pretty straight forward, but does not seemlessly fit the > hierarchical domain model and creates horrible indirections. > > #2 Is the proper abstraction, but requires to teach the MSI core code > about stacked MSI parent domains, which should not be horribly hard > or maybe just works out of the box. > > The PCI device driver invokes the not yet existing function > pci_enable_msi_parent_domain(pci_dev). This function does: > > A) validate that this PCI device has a V2 parent domain > > B) validate that the device has enabled MSI-X > > C) validate that the PCI/MSI-X domain supports dynamic MSI-X > allocation > > D) if #A to #C are true, it enables the PCI device parent domain > > I made #B a prerequisite for now, as that's an orthogonal problem, which > does not need to be solved upfront. Maybe it does not need to be solved > at all because the underlying PCI driver always allocates a management > interrupt before dealing with the underlying "bus", which is IMHO a > reasonable expectation. At least it's a reasonable restriction for > getting started. > > That function does: > > msix_dom = pci_msi_get_msix_domain(pdev); > msix_dom->flags |= IRQ_DOMAIN_FLAG_MSI_PARENT; > msix_dom->msi_parent_ops = &pci_msix_parent_ops; > > When creating the CXL devices the CXL code invokes > pci_msi_init_subdevice(pdev, &cxldev->device), which just does: > > dev_set_msi_domain(dev, pci_msi_get_msix_subdevice_parent(pdev)); > > That allows the CXL devices to create their own per device MSI > domain via a new function pci_msi_create_subdevice_irq_domain(). > > That function can just use a variant of pci_create_device_domain() with > a different domain template and a different irqchip, where the irqchip > just redirects to the underlying parent domain chip, aka PCI-MSI-X. > > I don't think the CXL device will require a special chip as all they > should need to know is the linux interrupt number. If there are special > requirements then we can bring the IMS code back or do something similar > to the platform MSI code. > > Then you need pci_subdev_msix_alloc_at() and pci_subdev_msix_free() > which are the only two functions which the CXL (or similar) need. > > The existing pci_msi*() API function might need a safety check so they > can't be abused by subdevices, but that's a problem for a final > solution. > > That's pretty much all what it takes. Hope that helps. Absolutely! Thanks for providing the detailed suggestion this definitely smells a lot less nasty than previous approach. I have things sort of working now but it's ugly code with a few cross layer hacks that need tidying up (around programming the msi registers from wrong 'device'), so may be a little while before I get it in a state to post. Jonathan > > Thanks, > > tglx > > >