On 04/10/17 14:50, Lorenzo Pieralisi wrote: > On Wed, Oct 04, 2017 at 12:22:04PM +0100, Marc Zyngier wrote: >> On 27/09/17 14:32, Shameer Kolothum wrote: >>> From: John Garry <john.garry@xxxxxxxxxx> >>> >>> On some platforms msi-controller address regions have to be excluded >>> from normal IOVA allocation in that they are detected and decoded in >>> a HW specific way by system components and so they cannot be considered >>> normal IOVA address space. >>> >>> Add a helper function that retrieves msi address regions through device >>> tree msi mapping, so that these regions will not be translated by IOMMU >>> and will be excluded from IOVA allocations. >>> >>> Signed-off-by: John Garry <john.garry@xxxxxxxxxx> >>> [Shameer: Modified msi-parent retrieval logic] >>> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@xxxxxxxxxx> >>> --- >>> drivers/iommu/of_iommu.c | 95 ++++++++++++++++++++++++++++++++++++++++++++++++ >>> include/linux/of_iommu.h | 10 +++++ >>> 2 files changed, 105 insertions(+) >>> >>> diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c >>> index e60e3db..ffd7fb7 100644 >>> --- a/drivers/iommu/of_iommu.c >>> +++ b/drivers/iommu/of_iommu.c >>> @@ -21,6 +21,7 @@ >>> #include <linux/iommu.h> >>> #include <linux/limits.h> >>> #include <linux/of.h> >>> +#include <linux/of_address.h> >>> #include <linux/of_iommu.h> >>> #include <linux/of_pci.h> >>> #include <linux/slab.h> >>> @@ -138,6 +139,14 @@ static int of_iommu_xlate(struct device *dev, >>> return ops->of_xlate(dev, iommu_spec); >>> } >>> >>> +static int __get_pci_rid(struct pci_dev *pdev, u16 alias, void *data) >>> +{ >>> + u32 *rid = data; >>> + >>> + *rid = alias; >>> + return 0; >>> +} >>> + >>> struct of_pci_iommu_alias_info { >>> struct device *dev; >>> struct device_node *np; >>> @@ -163,6 +172,73 @@ static int of_pci_iommu_init(struct pci_dev *pdev, u16 alias, void *data) >>> return info->np == pdev->bus->dev.of_node; >>> } >>> >>> +static int of_iommu_alloc_resv_region(struct device_node *np, >>> + struct list_head *head) >>> +{ >>> + int prot = IOMMU_WRITE | IOMMU_NOEXEC | IOMMU_MMIO; >>> + struct iommu_resv_region *region; >>> + struct resource res; >>> + int err; >>> + >>> + err = of_address_to_resource(np, 0, &res); >> >> What is the rational for registering the first region only? Surely you >> can have more than one region in an MSI controller (yes, your particular >> case is the ITS which has only one, but we're trying to do something >> generic here). >> >> Another issue I have with this code is that it inserts all of the ITS >> MMIO in the RESV_MSI range. As long as we don't generate any page tables >> for this, we're fine. But if we ever change this, we'll end-up with the >> ITS programming interface being exposed to a device, which wouldn't be >> acceptable. >> >> Why can't we have some generic property instead, that would describe the >> actual ranges that cannot be translated? That way, no random insertion >> of a random range, and relatively future proof. Indeed. Furthermore, IORT has the advantage of being limited to a few distinct device types and SBSA-compliant system topologies, so the ITS-chasing is reasonable there (modulo only reserving GITS_TRANSLATER). The scope of DT covers more embedded things as well like PCI host controllers with internal MSI doorbells, and potentially even direct-mapped memory regions for things like bootloader framebuffers to prevent display glitches - a generic address/size/flags description of a region could work for just about everything. > IORT code has the same issue (ie it reserves all ITS regions) and I do > not know where a property can be added to describe those ranges (IORT > specs ? I'd rather not) in ACPI other than the IORT tables entries > themselves. > >> I'm not sure where to declare it (PCIe node? IOMMU node?), but it'd feel >> like a much nicer and simpler solution to this problem. > > It could be but if we throw ACPI into the picture we have to knock > together Hisilicon specific namespace bindings to handle this and > quickly. As above I'm happy with the ITS-specific solution for ACPI given the limits of IORT. What I had in mind for DT was something tied in with the generic IOMMU bindings. Something like this is probably easiest to handle, but does rather spread the information around: pci { ... iommu-map = <0 0 &iommu1 0x10000>; iommu-reserved-ranges = <0x12340000 0x1000 IOMMU_MSI>, <0x34560000 0x1000 IOMMU_MSI>; }; display { ... iommus = <&iommu1 0x20000>; /* simplefb region */ iommu-reserved-ranges = <0x80080000 0x80000 IOMMU_DIRECT>, }; Alternatively, something inspired by reserved-memory might perhaps be conceptually neater, at the risk of being more complicated: iommu1: iommu@acbd0000 { ... #iommu-cells = <1>; iommu-reserved-ranges { #address-cells = <1>; #size-cells = <1>; its0_resv: msi@12340000 { compatible = "iommu-msi-region"; reg = <0x12340000 0x1000>; }; its1_resv: msi@34560000 { compatible = "iommu-msi-region"; reg = <0x34560000 0x1000>; }; fb_resv: direct@12340000 { compatible = "iommu-direct-region"; reg = <0x80080000 0x80000>; }; }; }; DT folks - any opinions? Robin. -- To unsubscribe from this list: send the line "unsubscribe devicetree" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html