On Thu, Jun 29, 2023 at 05:56:31PM -0500, Bjorn Helgaas wrote: > On Thu, Jun 29, 2023 at 10:19:47AM -0700, Lizhi Hou wrote: > > The PCI endpoint device such as Xilinx Alveo PCI card maps the register > > spaces from multiple hardware peripherals to its PCI BAR. Normally, > > the PCI core discovers devices and BARs using the PCI enumeration process. > > There is no infrastructure to discover the hardware peripherals that are > > present in a PCI device, and which can be accessed through the PCI BARs. > > IIUC this is basically a multi-function device except that instead of > each device being a separate PCI Function, they all appear in a single > Function. That would mean all the devices share the same config space > so a single PCI Command register controls all of them, they all share > the same IRQs (either INTx or MSI/MSI-X), any MMIO registers are likely > in a shared BAR, etc., right? Could be multiple BARs, but yes. > Obviously PCI enumeration only sees the single Function and binds a > single driver to it. But IIUC, you want to use existing drivers for > each of these sub-devices, so this series adds a DT node for the > single Function (using the quirks that call of_pci_make_dev_node()). > And I assume that when the PCI driver claims the single Function, it > will use that DT node to add platform devices, and those existing > drivers can claim those? Yes. It will call some variant of of_platform_populate(). > I don't see the PCI driver for the single Function in this series. Is > that coming? Is this series useful without it? https://lore.kernel.org/all/20220305052304.726050-4-lizhi.hou@xxxxxxxxxx/ I asked for things to be split up as the original series did a lot of new things at once. This series only works with the QEMU PCI test device which the DT unittest will use. > > Apparently, the device tree framework requires a device tree node for the > > PCI device. Thus, it can generate the device tree nodes for hardware > > peripherals underneath. Because PCI is self discoverable bus, there might > > not be a device tree node created for PCI devices. Furthermore, if the PCI > > device is hot pluggable, when it is plugged in, the device tree nodes for > > its parent bridges are required. Add support to generate device tree node > > for PCI bridges. > > Can you remind me why hot-adding a PCI device requires DT nodes for > parent bridges? Because the PCI device needs a DT node and we can't just put PCI devices in the DT root. We have to create the bus hierarchy. > I don't think we have those today, so maybe the DT > node for the PCI device requires a DT parent? How far up does that > go? All the way. > From this patch, I guess a Root Port would be the top DT node on > a PCIe system, since that's the top PCI-to-PCI bridge? Yes. Plus above the host bridge could have a hierarchy of nodes. > This patch adds a DT node for *every* PCI bridge in the system. We > only actually need that node for these unusual devices. Is there some > way the driver for the single PCI Function could add that node when it > is needed? Sorry if you've answered this in the past; maybe the > answer could be in the commit log or a code comment in case somebody > else wonders. This was discussed early on. I don't think it would work to create the nodes at the time we discover we have a device that wants a DT node. The issue is decisions are made in the code based on whether there's a DT node for a PCI device or not. It might work, but I think it's fragile to have nodes attached to devices at different points in time. > > > @@ -340,6 +340,8 @@ void pci_bus_add_device(struct pci_dev *dev) > > */ > > pcibios_bus_add_device(dev); > > pci_fixup_device(pci_fixup_final, dev); > > + if (pci_is_bridge(dev)) > > + of_pci_make_dev_node(dev); > > It'd be nice to have a clue here about why we need this, since this is > executed for *every* system, even ACPI platforms that typically don't > use OF things. > > > pci_create_sysfs_dev_files(dev); > > pci_proc_attach_device(dev); > > pci_bridge_d3_update(dev); > > diff --git a/drivers/pci/of.c b/drivers/pci/of.c > > index 2c25f4fa0225..9786ae407948 100644 > > --- a/drivers/pci/of.c > > +++ b/drivers/pci/of.c > > @@ -487,6 +487,15 @@ static int of_irq_parse_pci(const struct pci_dev *pdev, struct of_phandle_args * > > } else { > > /* We found a P2P bridge, check if it has a node */ > > ppnode = pci_device_to_OF_node(ppdev); > > +#if IS_ENABLED(CONFIG_PCI_DYNAMIC_OF_NODES) > > I would use plain #ifdef here instead of IS_ENABLED(), as you did in > pci.h below. IS_ENABLED() is true if the Kconfig symbol is set to > either "y" or "m". > > Using IS_ENABLED() suggests that the config option *could* be a > module, which is not the case here because CONFIG_PCI_DYNAMIC_OF_NODES > is a bool. > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/linux/kconfig.h?id=v6.4#n69 > > > @@ -617,6 +626,85 @@ int devm_of_pci_bridge_init(struct device *dev, struct pci_host_bridge *bridge) > > return pci_parse_request_of_pci_ranges(dev, bridge); > > } > > > > +#if IS_ENABLED(CONFIG_PCI_DYNAMIC_OF_NODES) > > Same here, of course. > > > +void of_pci_remove_node(struct pci_dev *pdev) > > +{ > > + struct device_node *np; > > + > > + np = pci_device_to_OF_node(pdev); > > + if (!np || !of_node_check_flag(np, OF_DYNAMIC)) > > > + * Each entry in the ranges table is a tuple containing the child address, > > + * the parent address, and the size of the region in the child address space. > > + * Thus, for PCI, in each entry parent address is an address on the primary > > + * side and the child address is the corresponding address on the secondary > > + * side. > > + */ > > +struct of_pci_range { > > + u32 child_addr[OF_PCI_ADDRESS_CELLS]; > > + u32 parent_addr[OF_PCI_ADDRESS_CELLS]; > > + u32 size[OF_PCI_SIZE_CELLS]; > > > + if (pci_is_bridge(pdev)) { > > + memcpy(rp[i].child_addr, rp[i].parent_addr, > > + sizeof(rp[i].child_addr)); > > + } else { > > + /* > > + * For endpoint device, the lower 64-bits of child > > + * address is always zero. > > I think this connects with the secondary side comment above, right? I > think I would comment this as: > > /* > * PCI-PCI bridges don't translate addresses, so the child > * (secondary side) address is identical to the parent (primary > * side) address. > */ > > and > > /* > * Non-bridges have no child (secondary side) address, so clear it > * out. > */ > > > + */ > > + rp[i].child_addr[0] = j; > > > + ret = of_changeset_add_empty_prop(ocs, np, "dynamic"); > > It seems slightly confusing to use a "dynamic" property here when we > also have the OF_DYNAMIC dynamic flag above. I think they have > different meanings, don't they? Hum, what's the property for? It's new in this version. Any DT property needs to be documented, but I don't see why we need it. Rob