Hi Rob, On 10/02/17 18:41, Rob Clark wrote: > For devices with iommu(s) in secure mode, we cannot touch global > registers, and we have to live with the context -> sid mapping that > the secure world has set up for us. > > This enables, for example db410c (apq8016) devices to use the up- > stream arm-smmu driver. This is the last major hurdle for having > the gpu working out of the box on an upstream kernel. > > NOTE: at this point, it works but I haven't spent any time thinking > much about the bindings. Since we can't read the global regs, we > need to get all the device config info from DT (or at least anything > that can't be hard-coded). This approach seems, I have to say, unworkably horrible. I'm absolutely against the idea of pretending we have access to global state which we don't, then piling a facsimile of that state into the DT for no reason other than to keep the overcomplicated pretence up. This configuration comes out looking and behaving like a discrete IOMMU (see e.g. rockchip-iommu for inspiration) - if we have to support it, it would make far more sense to simply describe what we have, i.e. a "qcom,apq8016-smmu-context-bank" with a single interrupt and #iommu-cells = <0>. They'd want their own probe routine and private data (AFAICS more or less just a base address, an IRQ, ID features and an iommu_group), and probably a separate iommu_ops because you'd want to handle {add,remove}_device() and device_group() significantly differently. By the time we get up to {attach,remove}_dev() it might be clean enough to dynamically handle both cases in the same code, especially with a little general refactor of the ARM_SMMU_CB* arithmetic, and once we get to operating on arm_smmu_domains there should be no real difference. > Also, this only works for non-secure contexts. For secure contexts, > even things like map/unmap need to go via scm, so that bypasses > even more of arm-smmu. I'm not sure if it is better to have all of > those special paths added to arm-smmu. Or have a 2nd iommu driver > for the secure contexts. (I think the second path only works since > I don't think the CPU is really touching the hw at all for secure > contexts.) > > Not in lieu of bindings docs (which will come before this is more > than just an RFC), but here is an example of what the DT bindings > look like: > > gpu_iommu: qcom,iommu@1f00000 { > compatible = "qcom,smmu-v2"; > reg = <0x1f00000 0x10000>; > > #global-interrupts = <1>; > interrupts = > <GIC_SPI 43 0>, // global > <GIC_SPI 241 0>, // unk0 > <GIC_SPI 241 0>, // gfx3d_user > <GIC_SPI 242 0>; // gfx3d_priv > > qcom,stream-to-cb = < > 0x0002 // unk0 > 0x0000 // gfx3d_user > 0x0001 // gfx3d_priv > >; > > #iommu-cells = <1>; > > clocks = <&gcc GCC_SMMU_CFG_CLK>, > <&gcc GCC_GFX_TCU_CLK>; > clock-names = "smmu_iface_clk", "smmu_bus_clk"; > > qcom,cb-count = <3>; > qcom,iommu-secure-id = <18>; > qcom,mapping-groups-count = <3>; > > status = "okay"; > }; > > Since we must live with the assignment of stream-id's to context bank > mapping that the secure world has set up for us, the "qcom,stream-to-cb" > binding gives a mapping table of sid to cb. (Indexed by cb, value is > the sid.) I'm not 100% sure what to do about devices with multiple > sid's.. if I understand how things work properly, we could probably > make the values in this table the result of OR'ing all the sids > together. > > The "qcom,cb-count" and "qcom,mapping-groups-count" can go away and > be computed from "qcom,stream-to-cb". I just haven't done that yet. > The "qcom,iommu-secure-id" field is needed for the one scm call needed > for non-secure contexts for initial configuration. > > Anyways, at this point, I'm mostly just looking for feedback about > whether this is the best way forward, vs introducing a seperate iommu > driver, and any suggestions anyone might have. And any ideas about how > to best handle the secure context banks, since I think we have no > choice but to use them for venus (the video enc/dec block). I'd say the answer to how we handle secure contexts is "we don't". Disregarding the apparent craziness of why the non-secure world would be allowed direct control of an entirely secure device, at that point it's not really an ARM SMMU any more, it's just *some* firmware-paravirtualised IOMMU. If you can't touch the hardware, who knows what's actually under there; "qcom-scm-iommu" is welcome to its own driver. Robin. > Signed-off-by: Rob Clark <robdclark@xxxxxxxxx> > --- > drivers/iommu/arm-smmu.c | 233 +++++++++++++++++++++++++++++++++++++++++++---- > 1 file changed, 217 insertions(+), 16 deletions(-) > > diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c > index 938417e..7b7d05f 100644 > --- a/drivers/iommu/arm-smmu.c > +++ b/drivers/iommu/arm-smmu.c > @@ -52,6 +52,9 @@ > #include <linux/slab.h> > #include <linux/spinlock.h> > > +#include <linux/qcom_scm.h> > + > #include <linux/amba/bus.h> > > #include "io-pgtable.h" > @@ -309,6 +312,7 @@ enum arm_smmu_implementation { > GENERIC_SMMU, > ARM_MMU500, > CAVIUM_SMMUV2, > + QCOM_SMMUV2, > }; > > struct arm_smmu_s2cr { > @@ -380,6 +384,11 @@ struct arm_smmu_device { > struct arm_smmu_s2cr *s2crs; > struct mutex stream_map_mutex; > > + void __iomem *local_base; /* specific to qcom */ > + u32 *stream_map; /* specific to qcom */ > + u32 sec_id; /* specific to qcom */ > + DECLARE_BITMAP(context_init, ARM_SMMU_MAX_CBS); /* specific to qcom */ > + > unsigned long va_size; > unsigned long ipa_size; > unsigned long pa_size; > @@ -599,6 +608,9 @@ static void __arm_smmu_tlb_sync(struct arm_smmu_device *smmu) > int count = 0; > void __iomem *gr0_base = ARM_SMMU_GR0(smmu); > > + if (smmu->model == QCOM_SMMUV2) > + return; > + > writel_relaxed(0, gr0_base + ARM_SMMU_GR0_sTLBGSYNC); > while (readl_relaxed(gr0_base + ARM_SMMU_GR0_sTLBGSTATUS) > & sTLBGSTATUS_GSACTIVE) { > @@ -739,19 +751,17 @@ static irqreturn_t arm_smmu_global_fault(int irq, void *dev) > return IRQ_HANDLED; > } > > -static void arm_smmu_init_context_bank(struct arm_smmu_domain *smmu_domain, > - struct io_pgtable_cfg *pgtbl_cfg) > +static void arm_smmu_init_context_bank_global(struct arm_smmu_domain *smmu_domain, > + struct io_pgtable_cfg *pgtbl_cfg) > { > - u32 reg, reg2; > - u64 reg64; > + u32 reg; > bool stage1; > struct arm_smmu_cfg *cfg = &smmu_domain->cfg; > struct arm_smmu_device *smmu = smmu_domain->smmu; > - void __iomem *cb_base, *gr1_base; > + void __iomem *gr1_base; > > gr1_base = ARM_SMMU_GR1(smmu); > stage1 = cfg->cbar != CBAR_TYPE_S2_TRANS; > - cb_base = ARM_SMMU_CB_BASE(smmu) + ARM_SMMU_CB(smmu, cfg->cbndx); > > if (smmu->version > ARM_SMMU_V1) { > if (cfg->fmt == ARM_SMMU_CTX_FMT_AARCH64) > @@ -782,6 +792,20 @@ static void arm_smmu_init_context_bank(struct arm_smmu_domain *smmu_domain, > reg |= ARM_SMMU_CB_VMID(smmu, cfg) << CBAR_VMID_SHIFT; > } > writel_relaxed(reg, gr1_base + ARM_SMMU_GR1_CBAR(cfg->cbndx)); > +} > + > +static void arm_smmu_init_context_bank_ctx(struct arm_smmu_domain *smmu_domain, > + struct io_pgtable_cfg *pgtbl_cfg) > +{ > + u32 reg, reg2; > + u64 reg64; > + bool stage1; > + struct arm_smmu_cfg *cfg = &smmu_domain->cfg; > + struct arm_smmu_device *smmu = smmu_domain->smmu; > + void __iomem *cb_base; > + > + stage1 = cfg->cbar != CBAR_TYPE_S2_TRANS; > + cb_base = ARM_SMMU_CB_BASE(smmu) + ARM_SMMU_CB(smmu, cfg->cbndx); > > /* TTBRs */ > if (stage1) { > @@ -848,8 +872,41 @@ static void arm_smmu_init_context_bank(struct arm_smmu_domain *smmu_domain, > writel_relaxed(reg, cb_base + ARM_SMMU_CB_SCTLR); > } > > +static int qcom_smmu_find_context_bank(struct arm_smmu_device *smmu, > + struct iommu_fwspec *fwspec) > +{ > + int i; > + > + /* TODO how to handle devices w/ multiple stream-ids? I think > + * things are arranged such that I can OR all the sids together > + * and match that to an entry in stream-map?? > + */ > + u32 sid = fwspec->ids[0]; > + > + for (i = 0; i < smmu->num_context_banks; i++) > + if (smmu->stream_map[i] == sid) > + return i; > + > + dev_err(smmu->dev, "Failed to find qcom,stream-to-cb entry for 0x%0x\n", sid); > + > + return -ENXIO; > +} > + > +static int qcom_smmu_sec_program_iommu(struct arm_smmu_device *smmu, > + struct arm_smmu_domain *smmu_domain) > +{ > + if (smmu->local_base) { > +#define SMMU_INTR_SEL_NS (0x2000) > + writel_relaxed(0xffffffff, smmu->local_base + SMMU_INTR_SEL_NS); > + mb(); > + } > + > + return qcom_scm_restore_sec_cfg(smmu->sec_id, smmu_domain->cfg.cbndx); > +} > + > static int arm_smmu_init_domain_context(struct iommu_domain *domain, > - struct arm_smmu_device *smmu) > + struct arm_smmu_device *smmu, > + struct iommu_fwspec *fwspec) > { > int irq, start, ret = 0; > unsigned long ias, oas; > @@ -953,12 +1010,22 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain, > goto out_unlock; > } > > - ret = __arm_smmu_alloc_bitmap(smmu->context_map, start, > - smmu->num_context_banks); > - if (ret < 0) > - goto out_unlock; > + if (smmu->model == QCOM_SMMUV2) { > + /* need to use sid->cb mapping that is already set up: */ > + ret = qcom_smmu_find_context_bank(smmu, fwspec); > + if (ret < 0) > + goto out_unlock; > + > + cfg->cbndx = ret; > + } else { > + ret = __arm_smmu_alloc_bitmap(smmu->context_map, start, > + smmu->num_context_banks); > + if (ret < 0) > + goto out_unlock; > + cfg->cbndx = ret; > + } > + > > - cfg->cbndx = ret; > if (smmu->version < ARM_SMMU_V2) { > cfg->irptndx = atomic_inc_return(&smmu->irptndx); > cfg->irptndx %= smmu->num_context_irqs; > @@ -986,8 +1053,23 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain, > domain->geometry.aperture_end = (1UL << ias) - 1; > domain->geometry.force_aperture = true; > > - /* Initialise the context bank with our page table cfg */ > - arm_smmu_init_context_bank(smmu_domain, &pgtbl_cfg); > + if (smmu->model == QCOM_SMMUV2) { > + /* initializing the hw must happen only on first attach.. > + * which we otherwise run afoul of since a _DMA domain > + * is automatically attached before drm/msm overrides > + * and attaches it's own domain. > + */ > + if (!test_and_set_bit(cfg->cbndx, smmu->context_init)) { > + ret = qcom_smmu_sec_program_iommu(smmu, smmu_domain); > + if (ret) > + goto out_unlock; > + } > + } else { > + /* Initialise the context bank with our page table cfg */ > + arm_smmu_init_context_bank_global(smmu_domain, &pgtbl_cfg); > + } > + > + arm_smmu_init_context_bank_ctx(smmu_domain, &pgtbl_cfg); > > /* > * Request context fault interrupt. Do this last to avoid the > @@ -1107,6 +1189,8 @@ static void arm_smmu_write_s2cr(struct arm_smmu_device *smmu, int idx) > > static void arm_smmu_write_sme(struct arm_smmu_device *smmu, int idx) > { > + if (smmu->model == QCOM_SMMUV2) > + return; > arm_smmu_write_s2cr(smmu, idx); > if (smmu->smrs) > arm_smmu_write_smr(smmu, idx); > @@ -1288,6 +1372,10 @@ static int arm_smmu_domain_add_master(struct arm_smmu_domain *smmu_domain, > s2cr[idx].type = type; > s2cr[idx].privcfg = S2CR_PRIVCFG_DEFAULT; > s2cr[idx].cbndx = cbndx; > + > + if (smmu->model == QCOM_SMMUV2) > + continue; > + > arm_smmu_write_s2cr(smmu, idx); > } > return 0; > @@ -1317,7 +1405,7 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) > > smmu = fwspec_smmu(fwspec); > /* Ensure that the domain is finalised */ > - ret = arm_smmu_init_domain_context(domain, smmu); > + ret = arm_smmu_init_domain_context(domain, smmu, fwspec); > if (ret < 0) > return ret; > > @@ -1682,6 +1770,9 @@ static void arm_smmu_device_reset(struct arm_smmu_device *smmu) > int i; > u32 reg, major; > > + if (smmu->model == QCOM_SMMUV2) > + return; > + > /* clear global FSR */ > reg = readl_relaxed(ARM_SMMU_GR0_NS(smmu) + ARM_SMMU_GR0_sGFSR); > writel(reg, ARM_SMMU_GR0_NS(smmu) + ARM_SMMU_GR0_sGFSR); > @@ -1764,6 +1855,103 @@ static void arm_smmu_device_reset(struct arm_smmu_device *smmu) > writel(reg, ARM_SMMU_GR0_NS(smmu) + ARM_SMMU_GR0_sCR0); > } > > +/* for devices with qcom,smmu-v2, we cannot touch the global space, > + * so get all the configuration from devicetree. > + */ > +static int qcom_smmu_device_cfg_probe(struct arm_smmu_device *smmu) > +{ > + struct device_node *np = smmu->dev->of_node; > + u32 size, cb_offset, i; > + int ret; > + > + if (of_property_read_u32(np, "qcom,cb-count", &smmu->num_context_banks)) { > + dev_err(smmu->dev, "missing qcom,cb-count property\n"); > + return -ENODEV; > + } > + > + dev_notice(smmu->dev, "\t%u context banks\n", smmu->num_context_banks); > + > + // TODO get from dt? > + smmu->pgsize_bitmap = SZ_4K; > + > + dev_notice(smmu->dev, "\tSupported page sizes: 0x%08lx\n", > + smmu->pgsize_bitmap); > + > + if (of_property_read_u32(np, "qcom,mapping-groups-count", &size)) { > + dev_err(smmu->dev, "missing qcom,cb-count property\n"); > + return -ENODEV; > + } > + > + if (of_property_read_u32(np, "qcom,cb-base-offset", &cb_offset)) > + cb_offset = 0x8000; > + > + if (of_property_read_u32(np, "qcom,iommu-secure-id", &smmu->sec_id)) { > + dev_err(smmu->dev, "missing qcom,iommu-secure-id property\n"); > + return -ENODEV; > + } > + > + /* > + * Bit of a hack perhaps.. ARM_SMMU_CB_BASE() assumes that cb0 starts > + * half way through the IO region size.. not sure that is really true. > + */ > + smmu->size = cb_offset << 1; > + smmu->pgshift = 12; > + > + smmu->stream_map = devm_kmalloc_array(smmu->dev, smmu->num_context_banks, > + sizeof(*smmu->stream_map), > + GFP_KERNEL); > + if (!smmu->stream_map) > + return -ENOMEM; > + > + for (i = 0; i < smmu->num_context_banks; i++) { > + ret = of_property_read_u32_index(np, "qcom,stream-to-cb", i, > + &smmu->stream_map[i]); > + if (ret) { > + dev_err(smmu->dev, "could not read qcom,stream-to-cb idx %d\n", i); > + return ret; > + } > + } > + > + /* > + * The hardware does actually support stream-matching, but we cannot > + * touch the global space, so we have to live with how secure world > + * has set things up for us. So don't set ARM_SMMU_FEAT_STREAM_MATCH > + * and leave smmu->smrs as NULL. We just need to arrange that each > + * domain gets mapped to the correct context in s2crs. > + */ > + > + smmu->s2crs = devm_kmalloc_array(smmu->dev, size, sizeof(*smmu->s2crs), > + GFP_KERNEL); > + if (!smmu->s2crs) > + return -ENOMEM; > + > + for (i = 0; i < size; i++) > + smmu->s2crs[i] = s2cr_init_val; > + > + smmu->num_mapping_groups = size; > + > + // *shrug*? > + smmu->streamid_mask = ~0; > + smmu->smr_mask_mask = ~0; > + smmu->features = ARM_SMMU_FEAT_TRANS_S1 | ARM_SMMU_FEAT_FMT_AARCH32_L; > + > + /* TODO we might need to be able to use this on some 32b devices? */ > + smmu->va_size = 48; > + smmu->ipa_size = 48; > + smmu->pa_size = 48; > + > + // XXX deduplicate from arm_smmu_device_cfg_probe() > + if (smmu->features & > + (ARM_SMMU_FEAT_FMT_AARCH32_L | ARM_SMMU_FEAT_FMT_AARCH64_4K)) > + smmu->pgsize_bitmap |= SZ_4K | SZ_2M | SZ_1G; > + if (smmu->features & ARM_SMMU_FEAT_FMT_AARCH64_16K) > + smmu->pgsize_bitmap |= SZ_16K | SZ_32M; > + if (smmu->features & ARM_SMMU_FEAT_FMT_AARCH64_64K) > + smmu->pgsize_bitmap |= SZ_64K | SZ_512M; > + > + return 0; > +} > + > static int arm_smmu_id_size_to_bits(int size) > { > switch (size) { > @@ -1833,6 +2021,11 @@ static int arm_smmu_device_cfg_probe(struct arm_smmu_device *smmu) > dev_notice(smmu->dev, "SMMUv%d with:\n", > smmu->version == ARM_SMMU_V2 ? 2 : 1); > > + mutex_init(&smmu->stream_map_mutex); > + > + if (smmu->model == QCOM_SMMUV2) > + return qcom_smmu_device_cfg_probe(smmu); > + > /* ID0 */ > id = readl_relaxed(gr0_base + ARM_SMMU_GR0_ID0); > > @@ -1918,7 +2111,6 @@ static int arm_smmu_device_cfg_probe(struct arm_smmu_device *smmu) > smmu->s2crs[i] = s2cr_init_val; > > smmu->num_mapping_groups = size; > - mutex_init(&smmu->stream_map_mutex); > > if (smmu->version < ARM_SMMU_V2 || !(id & ID0_PTFS_NO_AARCH32)) { > smmu->features |= ARM_SMMU_FEAT_FMT_AARCH32_L; > @@ -2034,6 +2226,7 @@ static struct arm_smmu_match_data name = { .version = ver, .model = imp } > > ARM_SMMU_MATCH_DATA(smmu_generic_v1, ARM_SMMU_V1, GENERIC_SMMU); > ARM_SMMU_MATCH_DATA(smmu_generic_v2, ARM_SMMU_V2, GENERIC_SMMU); > +ARM_SMMU_MATCH_DATA(qcom_smmuv2, ARM_SMMU_V2, QCOM_SMMUV2); > ARM_SMMU_MATCH_DATA(arm_mmu401, ARM_SMMU_V1_64K, GENERIC_SMMU); > ARM_SMMU_MATCH_DATA(arm_mmu500, ARM_SMMU_V2, ARM_MMU500); > ARM_SMMU_MATCH_DATA(cavium_smmuv2, ARM_SMMU_V2, CAVIUM_SMMUV2); > @@ -2041,6 +2234,7 @@ ARM_SMMU_MATCH_DATA(cavium_smmuv2, ARM_SMMU_V2, CAVIUM_SMMUV2); > static const struct of_device_id arm_smmu_of_match[] = { > { .compatible = "arm,smmu-v1", .data = &smmu_generic_v1 }, > { .compatible = "arm,smmu-v2", .data = &smmu_generic_v2 }, > + { .compatible = "qcom,smmu-v2", .data = &qcom_smmuv2 }, > { .compatible = "arm,mmu-400", .data = &smmu_generic_v1 }, > { .compatible = "arm,mmu-401", .data = &arm_mmu401 }, > { .compatible = "arm,mmu-500", .data = &arm_mmu500 }, > @@ -2172,6 +2366,13 @@ static int arm_smmu_device_probe(struct platform_device *pdev) > return PTR_ERR(smmu->base); > smmu->size = resource_size(res); > > + if (smmu->model == QCOM_SMMUV2) { > + res = platform_get_resource_byname(pdev, IORESOURCE_MEM, > + "smmu_local_base"); > + if (res) > + smmu->local_base = devm_ioremap_resource(dev, res); > + } > + > num_irqs = 0; > while ((res = platform_get_resource(pdev, IORESOURCE_IRQ, num_irqs))) { > num_irqs++; > -- To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html