On 05/10/17 13:37, John Garry wrote: > On 05/10/2017 12:07, Robin Murphy wrote: >> On 04/10/17 14:50, Lorenzo Pieralisi wrote: >>> On Wed, Oct 04, 2017 at 12:22:04PM +0100, Marc Zyngier wrote: >>>> On 27/09/17 14:32, Shameer Kolothum wrote: >>>>> From: John Garry <john.garry@xxxxxxxxxx> >>>>> >>>>> On some platforms msi-controller address regions have to be excluded >>>>> from normal IOVA allocation in that they are detected and decoded in >>>>> a HW specific way by system components and so they cannot be >>>>> considered >>>>> normal IOVA address space. >>>>> >>>>> Add a helper function that retrieves msi address regions through >>>>> device >>>>> tree msi mapping, so that these regions will not be translated by >>>>> IOMMU >>>>> and will be excluded from IOVA allocations. >>>>> >>>>> Signed-off-by: John Garry <john.garry@xxxxxxxxxx> >>>>> [Shameer: Modified msi-parent retrieval logic] >>>>> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@xxxxxxxxxx> >>>>> --- >>>>> drivers/iommu/of_iommu.c | 95 >>>>> ++++++++++++++++++++++++++++++++++++++++++++++++ >>>>> include/linux/of_iommu.h | 10 +++++ >>>>> 2 files changed, 105 insertions(+) >>>>> >>>>> diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c >>>>> index e60e3db..ffd7fb7 100644 >>>>> --- a/drivers/iommu/of_iommu.c >>>>> +++ b/drivers/iommu/of_iommu.c >>>>> @@ -21,6 +21,7 @@ >>>>> #include <linux/iommu.h> >>>>> #include <linux/limits.h> >>>>> #include <linux/of.h> >>>>> +#include <linux/of_address.h> >>>>> #include <linux/of_iommu.h> >>>>> #include <linux/of_pci.h> >>>>> #include <linux/slab.h> >>>>> @@ -138,6 +139,14 @@ static int of_iommu_xlate(struct device *dev, >>>>> return ops->of_xlate(dev, iommu_spec); >>>>> } >>>>> >>>>> +static int __get_pci_rid(struct pci_dev *pdev, u16 alias, void *data) >>>>> +{ >>>>> + u32 *rid = data; >>>>> + >>>>> + *rid = alias; >>>>> + return 0; >>>>> +} >>>>> + >>>>> struct of_pci_iommu_alias_info { >>>>> struct device *dev; >>>>> struct device_node *np; >>>>> @@ -163,6 +172,73 @@ static int of_pci_iommu_init(struct pci_dev >>>>> *pdev, u16 alias, void *data) >>>>> return info->np == pdev->bus->dev.of_node; >>>>> } >>>>> >>>>> +static int of_iommu_alloc_resv_region(struct device_node *np, >>>>> + struct list_head *head) >>>>> +{ >>>>> + int prot = IOMMU_WRITE | IOMMU_NOEXEC | IOMMU_MMIO; >>>>> + struct iommu_resv_region *region; >>>>> + struct resource res; >>>>> + int err; >>>>> + >>>>> + err = of_address_to_resource(np, 0, &res); >>>> >>>> What is the rational for registering the first region only? Surely you >>>> can have more than one region in an MSI controller (yes, your >>>> particular >>>> case is the ITS which has only one, but we're trying to do something >>>> generic here). >>>> >>>> Another issue I have with this code is that it inserts all of the ITS >>>> MMIO in the RESV_MSI range. As long as we don't generate any page >>>> tables >>>> for this, we're fine. But if we ever change this, we'll end-up with the >>>> ITS programming interface being exposed to a device, which wouldn't be >>>> acceptable. >>>> >>>> Why can't we have some generic property instead, that would describe >>>> the >>>> actual ranges that cannot be translated? That way, no random insertion >>>> of a random range, and relatively future proof. >> >> Indeed. Furthermore, IORT has the advantage of being limited to a few >> distinct device types and SBSA-compliant system topologies, so the >> ITS-chasing is reasonable there (modulo only reserving GITS_TRANSLATER). >> The scope of DT covers more embedded things as well like PCI host >> controllers with internal MSI doorbells, and potentially even >> direct-mapped memory regions for things like bootloader framebuffers to >> prevent display glitches - a generic address/size/flags description of a >> region could work for just about everything. >> >>> IORT code has the same issue (ie it reserves all ITS regions) and I do >>> not know where a property can be added to describe those ranges (IORT >>> specs ? I'd rather not) in ACPI other than the IORT tables entries >>> themselves. >>> >>>> I'm not sure where to declare it (PCIe node? IOMMU node?), but it'd >>>> feel >>>> like a much nicer and simpler solution to this problem. >>> >>> It could be but if we throw ACPI into the picture we have to knock >>> together Hisilicon specific namespace bindings to handle this and >>> quickly. >> >> As above I'm happy with the ITS-specific solution for ACPI given the >> limits of IORT. What I had in mind for DT was something tied in with the >> generic IOMMU bindings. Something like this is probably easiest to >> handle, but does rather spread the information around: >> >> >> pci { >> ... >> iommu-map = <0 0 &iommu1 0x10000>; >> iommu-reserved-ranges = <0x12340000 0x1000 IOMMU_MSI>, >> <0x34560000 0x1000 IOMMU_MSI>; >> }; >> >> display { >> ... >> iommus = <&iommu1 0x20000>; >> /* simplefb region */ >> iommu-reserved-ranges = <0x80080000 0x80000 IOMMU_DIRECT>, >> }; >> >> >> Alternatively, something inspired by reserved-memory might perhaps be >> conceptually neater, at the risk of being more complicated: >> >> >> iommu1: iommu@acbd0000 { >> ... >> #iommu-cells = <1>; >> >> iommu-reserved-ranges { >> #address-cells = <1>; >> #size-cells = <1>; >> >> its0_resv: msi@12340000 { >> compatible = "iommu-msi-region"; >> reg = <0x12340000 0x1000>; >> }; >> >> its1_resv: msi@34560000 { >> compatible = "iommu-msi-region"; >> reg = <0x34560000 0x1000>; >> }; >> >> fb_resv: direct@12340000 { >> compatible = "iommu-direct-region"; >> reg = <0x80080000 0x80000>; >> }; >> }; >> }; >> >> > > If we did this, wouldn't it be easier to create dangerous reserved > regions, like our ITS region which Marc was concerned by? It's not so > hard to get dts changes into the kernel. Well, yeah. It's also equally easy to add some peripheral register region to the /memory node and watch hilarity ensue. The solution to both is "don't put bogus crap in your DT". There's a big difference between wilfully misdescribing your platform requirements vs. having the kernel automagically infer something but leave a hole open in the process. Robin. -- To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html