On Sat, Jun 22, 2024 at 12:38:49PM -0500, Bjorn Helgaas wrote: > On Fri, Jun 21, 2024 at 05:43:21PM -0500, Bjorn Helgaas wrote: > > On Fri, Jun 21, 2024 at 06:29:48PM -0400, Frank Li wrote: > > > On Mon, Jun 17, 2024 at 10:26:36AM -0400, Frank Li wrote: > > > > On Thu, Jun 13, 2024 at 05:41:25PM -0500, Bjorn Helgaas wrote: > > > > > On Thu, Jun 06, 2024 at 04:24:17PM -0400, Frank Li wrote: > > > > > > On Mon, Jun 03, 2024 at 04:07:55PM -0400, Frank Li wrote: > > > > > > > On Mon, Jun 03, 2024 at 01:56:27PM -0500, Bjorn Helgaas wrote: > > > > > > > > On Mon, Jun 03, 2024 at 02:42:45PM -0400, Frank Li wrote: > > > > > > > > > On Mon, Jun 03, 2024 at 12:19:21PM -0500, Bjorn Helgaas wrote: > > > > > > > > > > On Fri, May 31, 2024 at 03:58:49PM +0100, Robin Murphy wrote: > > > > > > > > > > > On 2024-05-31 12:08 am, Bjorn Helgaas wrote: > > > > > > > > > > > > [+cc IOMMU and pcie-apple.c folks for comment] > > > > > > > > > > > > > > > > > > > > > > > > On Tue, May 28, 2024 at 03:39:21PM -0400, Frank Li wrote: > > > > > > > > > > > > > For the i.MX95, configuration of a LUT is necessary to convert Bus Device > > > > > > > > > > > > > Function (BDF) to stream IDs, which are utilized by both IOMMU and ITS. > > > > > > > > > > > > > This involves examining the msi-map and smmu-map to ensure consistent > > > > > > > > > > > > > mapping of PCI BDF to the same stream IDs. Subsequently, LUT-related > > > > > > > > > > > > > registers are configured. In the absence of an msi-map, the built-in MSI > > > > > > > > > > > > > controller is utilized as a fallback. > > > > > > > > > > > > > > > > > > > > > > > > > > Additionally, register a PCI bus notifier to trigger imx_pcie_add_device() > > > > > > > > > > > > > upon the appearance of a new PCI device and when the bus is an iMX6 PCI > > > > > > > > > > > > > controller. This function configures the correct LUT based on Device Tree > > > > > > > > > > > > > Settings (DTS). > > > > > > > > > > > > > > > > > > > > > > > > This scheme is pretty similar to apple_pcie_bus_notifier(). If we > > > > > > > > > > > > have to do this, I wish it were *more* similar, i.e., copy the > > > > > > > > > > > > function names, bitmap tracking, code structure, etc. > > > > > > > > > > > > > > > > > > > > > > > > I don't really know how stream IDs work, but I assume they are used on > > > > > > > > > > > > most or all arm64 platforms, so I'm a little surprised that of all the > > > > > > > > > > > > PCI host drivers used on arm64, only pcie-apple.c and pci-imx6.c need > > > > > > > > > > > > this notifier. > > > > > > > > > > > > > > > > > > > > > > This is one of those things that's mostly at the mercy of the PCIe root > > > > > > > > > > > complex implementation. Typically the SMMU StreamID and/or GIC ITS DeviceID > > > > > > > > > > > is derived directly from the PCI RID, sometimes with additional high-order > > > > > > > > > > > bits hard-wired to disambiguate PCI segments. I believe this RID-translation > > > > > > > > > > > LUT is a particular feature of the the Synopsys IP - I know there's also one > > > > > > > > > > > on the NXP Layerscape platforms, but on those it's programmed by the > > > > > > > > > > > bootloader, which also generates the appropriate "msi-map" and "iommu-map" > > > > > > > > > > > properties to match. Ideally that's what i.MX should do as well, but hey. > > > > > > > > > > > > > > > > > > > > Maybe this RID-translation is a feature of i.MX, not of Synopsys? I > > > > > > > > > > see that the LUT CSR accesses use IMX95_* definitions. > > > > > > > > > > > > > > > > > > Yes, it convert 16bit RID to 6bit stream id. > > > > > > > > > > > > > > > > IIUC, you're saying this is not a Synopsys feature, it's an i.MX > > > > > > > > feature. > > > > > > > > > > > > > > Yes, it is i.MX feature. But I think other vendor should have similar > > > > > > > situation if use old arm smmu. > > > > > > > > > > > > > > > > > > > > > > > > > > If it's really necessary to do this programming from Linux, then there's > > > > > > > > > > > still no point in it being dynamic - the mappings cannot ever change, since > > > > > > > > > > > the rest of the kernel believes that what the DT said at boot time was > > > > > > > > > > > already a property of the hardware. It would be a lot more logical, and > > > > > > > > > > > likely simpler, for the driver to just read the relevant map property and > > > > > > > > > > > program the entire LUT to match, all in one go at controller probe time. > > > > > > > > > > > Rather like what's already commonly done with the parsing of "dma-ranges" to > > > > > > > > > > > program address-translation LUTs for inbound windows. > > > > > > > > > > > > > > > > > > > > > > Plus that would also give a chance of safely dealing with bad DTs specifying > > > > > > > > > > > invalid ID mappings (by refusing to probe at all). As it is, returning an > > > > > > > > > > > error from a child's BUS_NOTIFY_ADD_DEVICE does nothing except prevent any > > > > > > > > > > > further notifiers from running at that point - the device will still be > > > > > > > > > > > added, allowed to bind a driver, and able to start sending DMA/MSI traffic > > > > > > > > > > > without the controller being correctly programmed, which at best won't work > > > > > > > > > > > and at worst may break the whole system. > > > > > > > > > > > > > > > > > > > > Frank, could the imx LUT be programmed once at boot-time instead of at > > > > > > > > > > device-add time? I'm guessing maybe not because apparently there is a > > > > > > > > > > risk of running out of LUT entries? > > > > > > > > > > > > > > > > > > It is not good idea to depend on boot loader so much. > > > > > > > > > > > > > > > > I meant "could this be programmed once when the Linux imx host > > > > > > > > controller driver is probed?" But from the below, it sounds like > > > > > > > > that's not possible in general because you don't have enough stream > > > > > > > > IDs to do that. > > > > > > > > > > > > > > Oh! sorry miss understand what your means. It is possible like what I did > > > > > > > at v3 version. But I think it is not good enough. > > > > > > > > > > > > > > > > > > > > > > > > Some hot plug devics > > > > > > > > > (SD7.0) may plug after system boot. Two PCIe instances shared one set > > > > > > > > > of 6bits stream id (total 64). Assume total 16 assign to two PCIe > > > > > > > > > controllers. each have 8 stream id. If use uboot assign it static, each > > > > > > > > > PCIe controller have below 8 devices. It will be failrue one controller > > > > > > > > > connect 7, another connect 9. but if dynamtic alloc when devices add, both > > > > > > > > > controller can work. > > > > > > > > > > > > > > > > > > Although we have not so much devices now, this way give us possility to > > > > > > > > > improve it in future. > > > > > > > > > > > > > > > > > > > It sounds like the consequences of running out of LUT entries are > > > > > > > > > > catastrophic, e.g., memory corruption from mis-directed DMA? If > > > > > > > > > > that's possible, I think we need to figure out how to prevent the > > > > > > > > > > device from being used, not just dev_warn() about it. > > > > > > > > > > > > > > > > > > Yes, but so far, we have not met such problem now. We can improve it when > > > > > > > > > we really face such problem. > > > > > > > > > > > > > > > > If this controller can only support DMA from a limited number of > > > > > > > > endpoints below it, I think we should figure out how to enforce that > > > > > > > > directly. Maybe we can prevent drivers from enabling bus mastering or > > > > > > > > something. I'm not happy with the idea of waiting for and debugging a > > > > > > > > report of data corruption. > > > > > > > > > > > > > > It may add a pre-add hook function to pci bridge. let me do more research. > > > > > > > > > > > > Hi Bjorn: > > > > > > > > > > > > int pci_setup_device(struct pci_dev *dev) > > > > > > { > > > > > > dev->error_state = pci_channel_io_normal; > > > > > > ... > > > > > > pci_fixup_device(pci_fixup_early, dev); > > > > > > > > > > > > ^^^ I can add fixup hook for pci_fixup_early. If not resource, > > > > > > I can set dev->error_state to pci_channel_io_frozen or > > > > > > pci_channel_io_perm_failure > > > > > > > > > > > > And add below check here after call hook function. > > > > > > > > > > > > if (dev->error_state != pci_channel_io_normal) > > > > > > return -EIO; > > > > > > > > > > > > } > > > > > > > > > > > > How do you think this method? If you agree, I can continue search device > > > > > > remove hook up. > > > > > > > > > > I think this would mean the device would not appear to be enumerated > > > > > at all, right? I.e., it wouldn't show up in lspci? And we couldn't > > > > > use even a pure programmed IO driver with no DMA or MSI? > > > > > > > > Make sense. Let me do more research on this. > > > > > > > > Frank > > > > > > > > > > I wonder if we should have a function pointer in struct > > > > > pci_host_bridge, kind of like the existing ->map_irq(), where we could > > > > > do host bridge-specific setup when enumerating a PCI device. > > > > > > Consider some device may no use MSI or DMA. It'd better set LUT when > > > allocate msi irq. I think insert a irq-domain in irq hierarchy. > > > > > > static const struct irq_domain_ops lut_pcie_msi_domain_ops = { > > > .alloc = lut_pcie_irq_domain_alloc, > > > .free = lut_pcie_irq_domain_free, > > > }; > > > > > > int dw_pcie_allocate_domains(struct dw_pcie_rp *pp) > > > { > > > struct fwnode_handle *fwnode = of_node_to_fwnode(pci->dev->of_node); > > > > > > pp->irq_domain = irq_domain_create_hierarchy(...) > > > > > > pp->msi_domain = pci_msi_create_irq_domain(...); > > > > > > return 0; > > > } > > > > > > Manage lut stream id in lut_pcie_irq_domain_alloc() and > > > lut_pcie_irq_domain_free(). > > > > > > So failure happen only when driver use MSI and no-stream ID avaiable. It > > > should be better than failure when add devices. Some devices may not use > > > at all. > > > > I'm not an IRQ expert, but it sounds plausible. There might even be > > an opportunity to fall back to INTx if there's no stream ID available > > for MSI? > > Sorry, I think this was a half-baked thought. Exhaustion of stream > IDs should be an uncommon situation, and the important thing is to > prevent terrible things from happening. > > I don't think it's worth bending over backwards to make everything > possible limp along. If it's easy to just make the device > inaccessible, that's fine. If there's a simple way to make it > available but keep from enabling bus mastering, we could do that too, > but only if it's really simple. Mani mentions qcom use simple method to config all lut when probe at qcom_pcie_config_sid_1_9_0(). It is similar with my v3 version. https://lore.kernel.org/imx/20240402-pci2_upstream-v3-8-803414bdb430@xxxxxxx/ Of course both have not resolved run-out sid problems. But genenerally, one PCIE slot only connect one devices. static alloc 8/16 sid are enough for 99.9% user case. best regards Frank Li