On Thu, 12 Sep 2024 17:37:20 +0100 Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx> wrote: > On Tue, 10 Sep 2024 22:04:18 +0200 > Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote: > > > On Tue, Sep 10 2024 at 17:47, Jonathan Cameron wrote: > > > There are quite a few places that make the assumption that the > > > preirq_data->parent->chip is the right chip to for example call > > > irq_set_affinity on. > > > > > > So the simple way to make it all work is to just have > > > a msi_domain_template->msi_domain_ops->prepare_desc > > > that sets the desc->dev = to the parent device (here the > > > pci_dev->dev) > > > > Creative :) > > One bit that is challenging me is that the PCI access functions > use the pci_dev that is embedded via irq_data->common->msi_desc->dev > (and hence doesn't give us a obvious path to any hierarchical structures) > E.g. __pci_write_msi_msg() and which checks the pci_dev is > powered up. > > I'd like to be able to do a call to the parent similar to irq_chip_unmask_parent > to handle write_msi_msg() for the new device domain. > > So how should this be avoided or should msi_desc->dev be the > PCI device? I thought about this whilst cycling home. Perhaps my fundamental question is more basic Which device should msi_dec->dev be? * The ultimate requester of the msi - so here the one calling our new pci_msix_subdev_alloc_at(), * The one on which the msi_write_msg() should occur. - here that would be the struct pci_dev given the checks needed on whether the device is powered up. If this is the case, can we instead use the irq_data->dev in __pci_write_msi_msg()? Also, is it valid to use the irq_domain->dev for callbacks such as msi_prepare_desc on basis that (I think) is the device that 'hosts' the irq domain, so can we use it to replace the use of msi_desc->dev in pci_msix_prepare_desc() If we can that enables our subdev to use a prepare_desc callback that does something like struct msi_domain *info; domain = domain->parent; info = domain->host_data; info->ops->prepare_desc(domain, arg, desc); Thanks for any pointers! Jonathan > > I can see some options but maybe I'm missing the big picture. > 1) Stop them using that path to get to the device because it > might not be the right one. Instead use device via irq_data->chip_data > (which is currently unused for these cases). > Some slightly fiddly work will be needed to handle changing __pci_write_msi_msg() > to talk irq_data though. > 2) Maybe setting the msi descriptor dev to the one we actually write > on is the right solution? (smells bad to me but I'm still getting > my head around this stuff). > > Any immediate thoughts? Or am I still thinking about this the wrong > way? (which is likely) > > > > > > At that point everything more or less just works and all the > > > rest of the callbacks can use generic forms. > > > > > > Alternative might be to make calls like the one in > > > arch/x86/kernel/apic/msi.c msi_set_affinity search > > > for the first ancestor device that has an irq_set_affinity(). > > > That unfortunately may affect quite a few places. > > > > What's the problem? None of the CXL drivers should care about that at > > all. Delegating other callbacks to the parent domain is what hierarchical > > domains are about. If there is some helper missing, so we add it. > > > > > Anyhow, I'll probably send out the prepare_desc hack set with driver > > > usage etc after I've written up a proper description of problems > > > encountered etc so you can see what it all looks like and will be more > > > palatable in general but in the meantime this is the guts of it of the > > > variant where the subdev related desc has the dev set to the parent > > > device. > > > > Let me look at this pile then :) > > > > > Note for the avoidance of doubt, I don't much like this > > > solution but maybe it will grow on me ;) > > > > Maybe, but I doubt it will survive my abstraction taste check. > > > > > +static const struct msi_parent_ops pci_msix_parent_ops = { > > > + .supported_flags = MSI_FLAG_PCI_MSIX | MSI_FLAG_PCI_MSIX_ALLOC_DYN, > > > + .prefix = "PCI-MSIX-PROXY-", > > > + .init_dev_msi_info = msi_parent_init_dev_msi_info, > > > > The obligatory link to: > > > > https://www.kernel.org/doc/html/latest/process/maintainer-tip.html#struct-declarations-and-initializers > > > > Thanks. Will fix up before I get anywhere near a real posting! > > > > +}; > > > + > > > +int pci_msi_enable_parent_domain(struct pci_dev *pdev) > > > +{ > > > + struct irq_domain *msix_dom; > > > + /* TGLX: Validate has v2 parent domain */ > > > + /* TGLX: validate msix enabled */ > > > + /* TGLX: Validate msix domain supports dynamics msi-x */ > > > + > > > + /* Enable PCI device parent domain */ > > > + msix_dom = dev_msi_get_msix_device_domain(&pdev->dev); > > > + msix_dom->flags |= IRQ_DOMAIN_FLAG_MSI_PARENT; > > > + msix_dom->msi_parent_ops = &pci_msix_parent_ops; > > > > That want's to be a call into the MSI core code, Something like: > > > > msi_enable_sub_parent_domain(&pdev->dev, DOMAIN_BUS_PCI_DEVICE_MSIX) > > > > or something like that. I really don't want to expose the internals of > > MSI. I spent a lot of effort to encapsulate that... > > > > > +void pci_msi_init_subdevice(struct pci_dev *pdev, struct device *dev) > > > +{ > > > + dev_set_msi_domain(dev, dev_msi_get_msix_device_domain(&pdev->dev)); > > > > msi_set_parent_domain(.....); > > > > > +static void pci_subdev_msix_prepare_desc(struct irq_domain *domain, msi_alloc_info_t *arg, > > > + struct msi_desc *desc) > > > +{ > > > + struct device *parent = desc->dev->parent; > > > + > > > + if (!desc->pci.mask_base) { > > > + /* Not elegant - but needed for irq affinity to work? */ > > > > Which makes me ask the question why desc->pci.mask_base is NULL in the > > first place. That needs some thought at the place which initializes the > > descriptor. > > Maybe this is the other way around? The descriptor is 'never' initialized > for this case. The equivalent code in msix_prepare_msi_desc() has > comments > " This is separate from msix_setup_msi_descs() below to handle dynamic > allocations for MSI-X after initial enablement." > So I think this !desc->pci.mask_base should always succeed > (and so I should get rid of it and run the descriptor initialize unconditionally) > > It might be appropriate to call the prepare_desc from the parent msi_domain though > via irq_domain->parent_domain->host_data > However this has the same need to not be based on the msi_desc->dev subject > to the question above. > > Jonathan > > > > > > > > --- a/kernel/irq/msi.c > > > +++ b/kernel/irq/msi.c > > > @@ -1720,3 +1720,8 @@ bool msi_device_has_isolated_msi(struct device *dev) > > > return arch_is_isolated_msi(); > > > } > > > EXPORT_SYMBOL_GPL(msi_device_has_isolated_msi); > > > +struct irq_domain *dev_msi_get_msix_device_domain(struct device *dev) > > > > You clearly run out of new lines here :) > > > > > +{ > > > + return dev->msi.data->__domains[0].domain; > > > +} > > > > But aside of that. No. See above. > > > > Thanks, > > > > tglx > > >