On Tue, 10 Sep 2024 22:04:18 +0200 Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote: > On Tue, Sep 10 2024 at 17:47, Jonathan Cameron wrote: > > There are quite a few places that make the assumption that the > > preirq_data->parent->chip is the right chip to for example call > > irq_set_affinity on. > > > > So the simple way to make it all work is to just have > > a msi_domain_template->msi_domain_ops->prepare_desc > > that sets the desc->dev = to the parent device (here the > > pci_dev->dev) > > Creative :) One bit that is challenging me is that the PCI access functions use the pci_dev that is embedded via irq_data->common->msi_desc->dev (and hence doesn't give us a obvious path to any hierarchical structures) E.g. __pci_write_msi_msg() and which checks the pci_dev is powered up. I'd like to be able to do a call to the parent similar to irq_chip_unmask_parent to handle write_msi_msg() for the new device domain. So how should this be avoided or should msi_desc->dev be the PCI device? I can see some options but maybe I'm missing the big picture. 1) Stop them using that path to get to the device because it might not be the right one. Instead use device via irq_data->chip_data (which is currently unused for these cases). Some slightly fiddly work will be needed to handle changing __pci_write_msi_msg() to talk irq_data though. 2) Maybe setting the msi descriptor dev to the one we actually write on is the right solution? (smells bad to me but I'm still getting my head around this stuff). Any immediate thoughts? Or am I still thinking about this the wrong way? (which is likely) > > > At that point everything more or less just works and all the > > rest of the callbacks can use generic forms. > > > > Alternative might be to make calls like the one in > > arch/x86/kernel/apic/msi.c msi_set_affinity search > > for the first ancestor device that has an irq_set_affinity(). > > That unfortunately may affect quite a few places. > > What's the problem? None of the CXL drivers should care about that at > all. Delegating other callbacks to the parent domain is what hierarchical > domains are about. If there is some helper missing, so we add it. > > > Anyhow, I'll probably send out the prepare_desc hack set with driver > > usage etc after I've written up a proper description of problems > > encountered etc so you can see what it all looks like and will be more > > palatable in general but in the meantime this is the guts of it of the > > variant where the subdev related desc has the dev set to the parent > > device. > > Let me look at this pile then :) > > > Note for the avoidance of doubt, I don't much like this > > solution but maybe it will grow on me ;) > > Maybe, but I doubt it will survive my abstraction taste check. > > > +static const struct msi_parent_ops pci_msix_parent_ops = { > > + .supported_flags = MSI_FLAG_PCI_MSIX | MSI_FLAG_PCI_MSIX_ALLOC_DYN, > > + .prefix = "PCI-MSIX-PROXY-", > > + .init_dev_msi_info = msi_parent_init_dev_msi_info, > > The obligatory link to: > > https://www.kernel.org/doc/html/latest/process/maintainer-tip.html#struct-declarations-and-initializers > Thanks. Will fix up before I get anywhere near a real posting! > > +}; > > + > > +int pci_msi_enable_parent_domain(struct pci_dev *pdev) > > +{ > > + struct irq_domain *msix_dom; > > + /* TGLX: Validate has v2 parent domain */ > > + /* TGLX: validate msix enabled */ > > + /* TGLX: Validate msix domain supports dynamics msi-x */ > > + > > + /* Enable PCI device parent domain */ > > + msix_dom = dev_msi_get_msix_device_domain(&pdev->dev); > > + msix_dom->flags |= IRQ_DOMAIN_FLAG_MSI_PARENT; > > + msix_dom->msi_parent_ops = &pci_msix_parent_ops; > > That want's to be a call into the MSI core code, Something like: > > msi_enable_sub_parent_domain(&pdev->dev, DOMAIN_BUS_PCI_DEVICE_MSIX) > > or something like that. I really don't want to expose the internals of > MSI. I spent a lot of effort to encapsulate that... > > > +void pci_msi_init_subdevice(struct pci_dev *pdev, struct device *dev) > > +{ > > + dev_set_msi_domain(dev, dev_msi_get_msix_device_domain(&pdev->dev)); > > msi_set_parent_domain(.....); > > > +static void pci_subdev_msix_prepare_desc(struct irq_domain *domain, msi_alloc_info_t *arg, > > + struct msi_desc *desc) > > +{ > > + struct device *parent = desc->dev->parent; > > + > > + if (!desc->pci.mask_base) { > > + /* Not elegant - but needed for irq affinity to work? */ > > Which makes me ask the question why desc->pci.mask_base is NULL in the > first place. That needs some thought at the place which initializes the > descriptor. Maybe this is the other way around? The descriptor is 'never' initialized for this case. The equivalent code in msix_prepare_msi_desc() has comments " This is separate from msix_setup_msi_descs() below to handle dynamic allocations for MSI-X after initial enablement." So I think this !desc->pci.mask_base should always succeed (and so I should get rid of it and run the descriptor initialize unconditionally) It might be appropriate to call the prepare_desc from the parent msi_domain though via irq_domain->parent_domain->host_data However this has the same need to not be based on the msi_desc->dev subject to the question above. Jonathan > > > --- a/kernel/irq/msi.c > > +++ b/kernel/irq/msi.c > > @@ -1720,3 +1720,8 @@ bool msi_device_has_isolated_msi(struct device *dev) > > return arch_is_isolated_msi(); > > } > > EXPORT_SYMBOL_GPL(msi_device_has_isolated_msi); > > +struct irq_domain *dev_msi_get_msix_device_domain(struct device *dev) > > You clearly run out of new lines here :) > > > +{ > > + return dev->msi.data->__domains[0].domain; > > +} > > But aside of that. No. See above. > > Thanks, > > tglx >