On Wed, Mar 29, 2017 at 10:13 AM, Oza Oza <oza.oza@xxxxxxxxxxxx> wrote: > On Tue, Mar 28, 2017 at 7:59 PM, Robin Murphy <robin.murphy@xxxxxxx> wrote: >> On 28/03/17 06:27, Oza Oza wrote: >>> On Mon, Mar 27, 2017 at 8:16 PM, Rob Herring <robh@xxxxxxxxxx> wrote: >>>> On Sat, Mar 25, 2017 at 12:31 AM, Oza Pawandeep <oza.oza@xxxxxxxxxxxx> wrote: >>>>> it is possible that PCI device supports 64-bit DMA addressing, >>>>> and thus it's driver sets device's dma_mask to DMA_BIT_MASK(64), >>>>> however PCI host bridge may have limitations on the inbound >>>>> transaction addressing. As an example, consider NVME SSD device >>>>> connected to iproc-PCIe controller. >>>>> >>>>> Currently, the IOMMU DMA ops only considers PCI device dma_mask >>>>> when allocating an IOVA. This is particularly problematic on >>>>> ARM/ARM64 SOCs where the IOMMU (i.e. SMMU) translates IOVA to >>>>> PA for in-bound transactions only after PCI Host has forwarded >>>>> these transactions on SOC IO bus. This means on such ARM/ARM64 >>>>> SOCs the IOVA of in-bound transactions has to honor the addressing >>>>> restrictions of the PCI Host. >>>>> >>>>> current pcie frmework and of framework integration assumes dma-ranges >>>>> in a way where memory-mapped devices define their dma-ranges. >>>>> dma-ranges: (child-bus-address, parent-bus-address, length). >>>>> >>>>> but iproc based SOCs and even Rcar based SOCs has PCI world dma-ranges. >>>>> dma-ranges = <0x43000000 0x00 0x00 0x00 0x00 0x80 0x00>; >>>> >>>> If you implement a common function, then I expect to see other users >>>> converted to use it. There's also PCI hosts in arch/powerpc that parse >>>> dma-ranges. >>> >>> the common function should be similar to what >>> of_pci_get_host_bridge_resources is doing right now. >>> it parses ranges property right now. >>> >>> the new function would look look following. >>> >>> of_pci_get_dma_ranges(struct device_node *dev, struct list_head *resources) >>> where resources would return the dma-ranges. >>> >>> but right now if you see the patch, of_dma_configure calls the new >>> function, which actually returns the largest possible size. >>> so this new function has to be generic in a way where other PCI hosts >>> can use it. but certainly iproc(Broadcom SOC) , rcar based SOCs can >>> use it for sure. >>> >>> although having powerpc using it; is a separate exercise, since I do >>> not have any access to other PCI hosts such as powerpc. but we can >>> workout with them on thsi forum if required. >>> >>> so overall, of_pci_get_dma_ranges has to serve following 2 purposes. >>> >>> 1) it has to return largest possible size to of_dma_configure to >>> generate largest possible dma_mask. >>> >>> 2) it also has to return resources (dma-ranges) parsed, to the users. >>> >>> so to address above needs >>> >>> of_pci_get_dma_ranges(struct device_node *dev, struct list_head >>> *resources, u64 *size) >>> >>> dev -> device node. >>> resources -> dma-ranges in allocated list. >>> size -> highest possible size to generate possible dma_mask for >>> of_dma_configure. >>> >>> let em know how this sounds. >> >> Note that the point of passing PCI host bridges into of_dma_configure() >> in the first place was to avoid having some separate PCI-specific path >> for DMA configuration. I worry that introducing bus-specific dma-ranges >> parsing largely defeats that, since we end up with the worst of both >> worlds; effectively-duplicated code, and/or a load of extra complexity >> to then attempt to reconverge the divergent paths (there really >> shouldn't be any need to allocate a list of anything). Given that >> of_translate_dma_address() is already bus-agnostic, it hardly seems >> justifiable for its caller not to be so as well, especially when it >> mostly just comes down to getting the right #address-cells value. >> >> The patch below is actually enough to make typical cases work, but is >> vile, so I'm not seriously considering it (hence I've not bothered >> making IOMMU configuration handle all circumstances). What it has served >> to do, though, is give me a clear idea of how to properly sort out the >> not-quite-right device/parent assumptions between of_dma_configure() and >> of_dma_get_range() rather than bodging around them any further - stay tuned. >> >> Robin. >> >> ----->8----- >> From: Robin Murphy <robin.murphy@xxxxxxx> >> Subject: [PATCH] of/pci: Use child node for DMA configuration >> >> of_dma_configure() expects to be passed an OF node representing the >> device being configured - for PCI devices we currently pass the node of >> the appropriate host controller, which sort of works for inherited >> properties which may appear at any level, like "dma-coherent", but falls >> apart for properties which actually care about specific device-parent >> relationships, like "dma-ranges". >> >> Solve this by attempting to find a suitable child node if the PCI >> hierarchy is actually represented in DT, and if not then faking one up >> as a last resort, to make all of DMA configuration work as expected. >> >> Signed-off-by: Robin Murphy <robin.murphy@xxxxxxx> >> --- >> drivers/iommu/of_iommu.c | 3 ++- >> drivers/pci/of.c | 24 ++++++++++++++++++++++++ >> drivers/pci/probe.c | 14 +++++++++++++- >> include/linux/pci.h | 3 +++ >> 4 files changed, 42 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c >> index 2683e9fc0dcf..35c97b945c15 100644 >> --- a/drivers/iommu/of_iommu.c >> +++ b/drivers/iommu/of_iommu.c >> @@ -146,7 +146,8 @@ const struct iommu_ops *of_iommu_configure(struct >> device *dev, >> int idx = 0; >> >> if (dev_is_pci(dev)) >> - return of_pci_iommu_configure(to_pci_dev(dev), master_np); >> + return of_pci_iommu_configure(to_pci_dev(dev), >> + of_get_next_parent(master_np)); >> >> /* >> * We don't currently walk up the tree looking for a parent IOMMU. >> diff --git a/drivers/pci/of.c b/drivers/pci/of.c >> index e112da11630e..96eece1c423d 100644 >> --- a/drivers/pci/of.c >> +++ b/drivers/pci/of.c >> @@ -88,3 +88,27 @@ struct irq_domain >> *pci_host_bridge_of_msi_domain(struct pci_bus *bus) >> return NULL; >> #endif >> } >> + >> +struct device_node *pci_dev_get_dma_of_node(struct pci_dev *dev) >> +{ >> + struct device_node *np; >> + struct pci_bus *bus = dev->bus; >> + >> + /* Is the device itself actually described in the DT? */ >> + np = of_node_get(dev->dev.of_node); >> + if (np) >> + return np; >> + >> + /* Or is some intermediate bridge? That would do... */ >> + for (bus = dev->bus; bus->parent; bus = bus->parent) { >> + np = of_node_get(bus->bridge->of_node); >> + if (np) >> + return np; >> + } >> + >> + /* Failing that, any child of the same host bridge? */ >> + if (bus->bridge->parent) >> + np = of_get_next_child(bus->bridge->parent->of_node, NULL); >> + >> + return np; >> +} >> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c >> index dfc9a2794141..4f7ade64aa3e 100644 >> --- a/drivers/pci/probe.c >> +++ b/drivers/pci/probe.c >> @@ -1906,7 +1906,19 @@ static void pci_dma_configure(struct pci_dev *dev) >> >> if (IS_ENABLED(CONFIG_OF) && >> bridge->parent && bridge->parent->of_node) { >> - of_dma_configure(&dev->dev, bridge->parent->of_node); >> + struct device_node *np, tmp = {0}; >> + >> + np = pci_dev_get_dma_of_node(dev); >> + if (!np) { >> + np = &tmp; >> + of_node_set_flag(np, OF_DETACHED); >> + of_node_init(np); >> + tmp.parent = bridge->parent->of_node; >> + } >> + >> + of_dma_configure(&dev->dev, np); >> + if (np != &tmp) >> + of_node_put(np); >> } else if (has_acpi_companion(bridge)) { >> struct acpi_device *adev = to_acpi_device_node(bridge->fwnode); >> enum dev_dma_attr attr = acpi_get_dma_attr(adev); >> diff --git a/include/linux/pci.h b/include/linux/pci.h >> index eb3da1a04e6c..94ecd1817f58 100644 >> --- a/include/linux/pci.h >> +++ b/include/linux/pci.h >> @@ -2086,6 +2086,7 @@ void pci_release_of_node(struct pci_dev *dev); >> void pci_set_bus_of_node(struct pci_bus *bus); >> void pci_release_bus_of_node(struct pci_bus *bus); >> struct irq_domain *pci_host_bridge_of_msi_domain(struct pci_bus *bus); >> +struct device_node *pci_dev_get_dma_of_node(struct pci_dev *dev); >> >> /* Arch may override this (weak) */ >> struct device_node *pcibios_get_phb_of_node(struct pci_bus *bus); >> @@ -2110,6 +2111,8 @@ static inline struct device_node * >> pci_device_to_OF_node(const struct pci_dev *pdev) { return NULL; } >> static inline struct irq_domain * >> pci_host_bridge_of_msi_domain(struct pci_bus *bus) { return NULL; } >> +static inline struct device_node * >> +pci_dev_get_dma_of_node(struct pci_dev *dev) { return NULL; } >> #endif /* CONFIG_OF */ >> >> #ifdef CONFIG_ACPI >> > > pci and memory mapped device world is different. of course if you talk > from iommu perspective, all the master are same for IOMMU > but the original intention of the patch is to try to solve 2 problems. > > 1) expose generic API for pci world clients to configure their > dma-ranges. right now there is none. > 2) same API can be used by IOMMU to derive dma_mask. > > the implementation tries to handle dma-ranges for both memory mapped > and pci devices, which I think is overkill. > besides we have different configuration of dma-ranges based on iommu > is enabled or not enabled. > > #define PCIE_DMA_RANGES dma-ranges = <0x43000000 0x00 0x80000000 0x00 > 0x80000000 0x00 0x80000000 \ > 0x43000000 0x08 0x00000000 0x08 > 0x00000000 0x08 0x00000000 \ > 0x43000000 0x80 0x00000000 0x80 > 0x00000000 0x40 0x00000000>; > > > the of_dma_get_range is written with respect to traditional memory > ranges, they were not meant for pci dma-ranges. > > Regards, > Oza. Rob, gentle reminder on your viewpoint. my take on whole thing is as follows: 1) of_dma_get_range is supposed to handle traditional dma-ranges (not pci one) 2) we need separate function for pci users such as iproc or rcar SOCs to have their dma-ranges parsed, which can be extended later for e.g. pci flags have some meanings. besides of_dma_configure is written to pass only single out parameter (..., *dma_addr, *size) handling multiple ranges here, and passing only one range out is not desirable. also you would not like to handle pci flags in of_dma_get_range (the first portion of DT entry) in this function if required in future. this new function will be similar to of_pci_get_host_bridge_resources which I am writing/improving right now.. Regards, Oza.