Re: [PATCH] iomm/arm-smmu: Add stall implementation hook

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2020-05-07 11:14 am, Sai Prakash Ranjan wrote:
Hi Will, Robin

On 2020-04-22 01:50, Sai Prakash Ranjan wrote:
Add stall implementation hook to enable stalling
faults on QCOM platforms which supports it without
causing any kind of hardware mishaps. Without this
on QCOM platforms, GPU faults can cause unrelated
GPU memory accesses to return zeroes. This has the
unfortunate result of command-stream reads from CP
getting invalid data, causing a cascade of fail.

I think this came up before, but something about this rationale doesn't add up - we're not *using* stalls at all, we're still terminating faulting transactions unconditionally; we're just using CFCFG to terminate them with a slight delay, rather than immediately. It's really not clear how or why that makes a difference. Is it a GPU bug? Or an SMMU bug? Is this reliable (or even a documented workaround for something), or might things start blowing up again if any other behaviour subtly changes? I'm not dead set against adding this, but I'd *really* like to have a lot more confidence in it.

Suggested-by: Rob Clark <robdclark@xxxxxxxxx>
Signed-off-by: Sai Prakash Ranjan <saiprakash.ranjan@xxxxxxxxxxxxxx>
---
This has been attempted previously by Rob Clark in 2017, 2018.
Hopefully we can get something concluded in 2020.
 * https://patchwork.kernel.org/patch/9953803/
 * https://patchwork.kernel.org/patch/10618713/
---
 drivers/iommu/arm-smmu-qcom.c | 1 +
 drivers/iommu/arm-smmu.c      | 7 +++++++
 drivers/iommu/arm-smmu.h      | 1 +
 3 files changed, 9 insertions(+)

diff --git a/drivers/iommu/arm-smmu-qcom.c b/drivers/iommu/arm-smmu-qcom.c
index 24c071c1d8b0..a13b229389d4 100644
--- a/drivers/iommu/arm-smmu-qcom.c
+++ b/drivers/iommu/arm-smmu-qcom.c
@@ -32,6 +32,7 @@ static int qcom_sdm845_smmu500_reset(struct
arm_smmu_device *smmu)

 static const struct arm_smmu_impl qcom_smmu_impl = {
     .reset = qcom_sdm845_smmu500_reset,
+    .stall = true,
 };

 struct arm_smmu_device *qcom_smmu_impl_init(struct arm_smmu_device *smmu)
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index e622f4e33379..16b03fca9966 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -488,6 +488,11 @@ static irqreturn_t arm_smmu_context_fault(int
irq, void *dev)
                 fsr, iova, fsynr, cbfrsynra, idx);

     arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_FSR, fsr);
+
+    if (smmu->impl && smmu->impl->stall && (fsr & ARM_SMMU_FSR_SS))
+        arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_RESUME,
+                  ARM_SMMU_RESUME_TERMINATE);

Shouldn't this be *before* the write to FSR, in case the outstanding fault causes that to be immediately reasserted before we write CB_RESUME and we end up immediately taking the IRQ a second time?

(The overall enablement being in impl is sound, but you still don't get to play "works on my machine" in the architectural code :P)

Robin.

+
     return IRQ_HANDLED;
 }

@@ -659,6 +664,8 @@ static void arm_smmu_write_context_bank(struct
arm_smmu_device *smmu, int idx)
         reg |= ARM_SMMU_SCTLR_S1_ASIDPNE;
     if (IS_ENABLED(CONFIG_CPU_BIG_ENDIAN))
         reg |= ARM_SMMU_SCTLR_E;
+    if (smmu->impl && smmu->impl->stall)
+        reg |= ARM_SMMU_SCTLR_CFCFG;

     arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_SCTLR, reg);
 }
diff --git a/drivers/iommu/arm-smmu.h b/drivers/iommu/arm-smmu.h
index 8d1cd54d82a6..d5134e0d5cce 100644
--- a/drivers/iommu/arm-smmu.h
+++ b/drivers/iommu/arm-smmu.h
@@ -386,6 +386,7 @@ struct arm_smmu_impl {
     int (*init_context)(struct arm_smmu_domain *smmu_domain);
     void (*tlb_sync)(struct arm_smmu_device *smmu, int page, int sync,
              int status);
+    bool stall;
 };

 static inline void __iomem *arm_smmu_page(struct arm_smmu_device *smmu, int n)

Any comments on this patch?

Thanks,
Sai




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [Linux for Sparc]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux