On Fri, May 5, 2017 at 9:21 PM, Robin Murphy <robin.murphy@xxxxxxx> wrote: > On 04/05/17 19:52, Oza Oza wrote: >> On Thu, May 4, 2017 at 11:50 PM, Robin Murphy <robin.murphy@xxxxxxx> wrote: >>> On 03/05/17 05:46, Oza Pawandeep wrote: >>>> this patch reserves the iova for PCI masters. >>>> ARM64 based SOCs may have scattered memory banks. >>>> such as iproc based SOC has >>>> >>>> <0x00000000 0x80000000 0x0 0x80000000>, /* 2G @ 2G */ >>>> <0x00000008 0x80000000 0x3 0x80000000>, /* 14G @ 34G */ >>>> <0x00000090 0x00000000 0x4 0x00000000>, /* 16G @ 576G */ >>>> <0x000000a0 0x00000000 0x4 0x00000000>; /* 16G @ 640G */ >>>> >>>> but incoming PCI transcation addressing capability is limited >>>> by host bridge, for example if max incoming window capability >>>> is 512 GB, then 0x00000090 and 0x000000a0 will fall beyond it. >>>> >>>> to address this problem, iommu has to avoid allocating iovas which >>>> are reserved. which inturn does not allocate iova if it falls into hole. >>> >>> I don't necessarily disagree with doing this, as we could do with facing >>> up to the issue of discontiguous DMA ranges in particular (I too have a >>> platform with this problem), but I'm still not overly keen on pulling DT >>> specifics into this layer. More than that, though, if we are going to do >>> it, then we should do it for all devices with a restrictive >>> "dma-ranges", not just PCI ones. >>> >> >> How do you propose to do it ? >> >> my thinking is this: >> iova_reserve_pci_windows is written specific for PCI, and I am adding there. >> >> ideally >> struct pci_host_bridge should have new member: >> >> struct list_head inbound_windows; /* resource_entry */ >> >> but somehow this resource have to be filled much before >> iommu_dma_init_domain happens. >> and use brdge resource directly in iova_reserve_pci_windows as it is >> already doing it for outbound memory. >> >> this will detach the DT specifics from dma-immu layer. >> let me know how this sounds. > > Please check the state of the code currently queued in Joerg's tree and > in linux-next - iommu_dma_get_resv_regions() has room for > device-agnostic stuff before the if (!dev_is_pci(dev)) check. > > Furthermore, with the probe-deferral changes we end up with a common > dma_configure() routine to abstract the firmware-specifics of > of_dma_configure() vs. acpi_dma_configure(), so it would make sense to > give drivers etc. a similar interface for interrogating ranges. i.e. > some common function that abstracts the difference between parsing DT > dma-ranges vs. the ACPI _DMA object, either with a list-based get/put > model or perhaps an iterator with a user-provided callback (so users > could process in-place or create their own list as necessary). Unless of > course we go all the way to making the ranges an inherent part of the > device layer like some MIPS platforms currently do. > > Robin. > Hi Robin, your above comments are taken care to the best of my understanding. Kindly have a look at the PATCH v7. the whole patch-set takes care of IOVA reservation for PCI inbound memory. now there is no dependency on IOMMU and OF layer. Regards, Oza. >>>> Bug: SOC-5216 >>>> Change-Id: Icbfc99a045d730be143fef427098c937b9d46353 >>>> Signed-off-by: Oza Pawandeep <oza.oza@xxxxxxxxxxxx> >>>> Reviewed-on: http://gerrit-ccxsw.broadcom.net/40760 >>>> Reviewed-by: vpx_checkpatch status <vpx_checkpatch@xxxxxxxxxxxx> >>>> Reviewed-by: CCXSW <ccxswbuild@xxxxxxxxxxxx> >>>> Tested-by: vpx_autobuild status <vpx_autobuild@xxxxxxxxxxxx> >>>> Tested-by: vpx_smoketest status <vpx_smoketest@xxxxxxxxxxxx> >>>> Tested-by: CCXSW <ccxswbuild@xxxxxxxxxxxx> >>>> Reviewed-by: Scott Branden <scott.branden@xxxxxxxxxxxx> >>>> >>>> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c >>>> index 48d36ce..08764b0 100644 >>>> --- a/drivers/iommu/dma-iommu.c >>>> +++ b/drivers/iommu/dma-iommu.c >>>> @@ -27,6 +27,7 @@ >>>> #include <linux/iova.h> >>>> #include <linux/irq.h> >>>> #include <linux/mm.h> >>>> +#include <linux/of_pci.h> >>>> #include <linux/pci.h> >>>> #include <linux/scatterlist.h> >>>> #include <linux/vmalloc.h> >>>> @@ -171,8 +172,12 @@ static void iova_reserve_pci_windows(struct pci_dev *dev, >>>> struct iova_domain *iovad) >>>> { >>>> struct pci_host_bridge *bridge = pci_find_host_bridge(dev->bus); >>>> + struct device_node *np = bridge->dev.parent->of_node; >>>> struct resource_entry *window; >>>> unsigned long lo, hi; >>>> + int ret; >>>> + dma_addr_t tmp_dma_addr = 0, dma_addr; >>>> + LIST_HEAD(res); >>>> >>>> resource_list_for_each_entry(window, &bridge->windows) { >>>> if (resource_type(window->res) != IORESOURCE_MEM && >>>> @@ -183,6 +188,36 @@ static void iova_reserve_pci_windows(struct pci_dev *dev, >>>> hi = iova_pfn(iovad, window->res->end - window->offset); >>>> reserve_iova(iovad, lo, hi); >>>> } >>>> + >>>> + /* PCI inbound memory reservation. */ >>>> + ret = of_pci_get_dma_ranges(np, &res); >>>> + if (!ret) { >>>> + resource_list_for_each_entry(window, &res) { >>>> + struct resource *res_dma = window->res; >>>> + >>>> + dma_addr = res_dma->start - window->offset; >>>> + if (tmp_dma_addr > dma_addr) { >>>> + pr_warn("PCI: failed to reserve iovas; ranges should be sorted\n"); >>> >>> I don't see anything in the DT spec about the entries having to be >>> sorted, and it's not exactly impossible to sort a list if you need it so >>> (and if I'm being really pedantic, one could still trigger this with a >>> list that *is* sorted, only by different criteria). >>> >> >> we have to sort it the way we want then. I can make it sort then. >> thanks for the suggestion. >> >>> Robin. >>> >>>> + return; >>>> + } >>>> + if (tmp_dma_addr != dma_addr) { >>>> + lo = iova_pfn(iovad, tmp_dma_addr); >>>> + hi = iova_pfn(iovad, dma_addr - 1); >>>> + reserve_iova(iovad, lo, hi); >>>> + } >>>> + tmp_dma_addr = window->res->end - window->offset; >>>> + } >>>> + /* >>>> + * the last dma-range should honour based on the >>>> + * 32/64-bit dma addresses. >>>> + */ >>>> + if (tmp_dma_addr < DMA_BIT_MASK(sizeof(dma_addr_t) * 8)) { >>>> + lo = iova_pfn(iovad, tmp_dma_addr); >>>> + hi = iova_pfn(iovad, >>>> + DMA_BIT_MASK(sizeof(dma_addr_t) * 8) - 1); >>>> + reserve_iova(iovad, lo, hi); >>>> + } >>>> + } >>>> } >>>> >>>> /** >>>> >>> > -- To unsubscribe from this list: send the line "unsubscribe devicetree" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html