On Tue, Mar 4, 2025 at 8:57 AM Connor Abbott <cwabbott0@xxxxxxxxx> wrote: > > On some SMMUv2 implementations, including MMU-500, SMMU_CBn_FSR.SS > asserts an interrupt. The only way to clear that bit is to resume the > transaction by writing SMMU_CBn_RESUME, but typically resuming the > transaction requires complex operations (copying in pages, etc.) that > can't be done in IRQ context. drm/msm already has a problem, because > its fault handler sometimes schedules a job to dump the GPU state and > doesn't resume translation until this is complete. > > Work around this by disabling context fault interrupts until after the > transaction is resumed. Because other context banks can share an IRQ > line, we may still get an interrupt intended for another context bank, > but in this case only SMMU_CBn_FSR.SS will be asserted and we can skip > it assuming that interrupts are disabled which is accomplished by > removing the bit from ARM_SMMU_CB_FSR_FAULT. SMMU_CBn_FSR.SS won't be > asserted unless an external user enabled stall-on-fault, and they are > expected to resume the translation and re-enable interrupts. > > Signed-off-by: Connor Abbott <cwabbott0@xxxxxxxxx> > Reviewed-by Robin Murphy <robin.murphy@xxxxxxx> Reviewed-by: Rob Clark <robdclark@xxxxxxxxx> > --- > drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 15 ++++++++++- > drivers/iommu/arm/arm-smmu/arm-smmu.c | 41 +++++++++++++++++++++++++++++- > drivers/iommu/arm/arm-smmu/arm-smmu.h | 1 - > 3 files changed, 54 insertions(+), 3 deletions(-) > > diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c > index 186d6ad4fd1c990398df4dec53f4d58ada9e658c..a428e53add08d451fb2152e3ab80e0fba936e214 100644 > --- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c > +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c > @@ -90,12 +90,25 @@ static void qcom_adreno_smmu_resume_translation(const void *cookie, bool termina > struct arm_smmu_domain *smmu_domain = (void *)cookie; > struct arm_smmu_cfg *cfg = &smmu_domain->cfg; > struct arm_smmu_device *smmu = smmu_domain->smmu; > - u32 reg = 0; > + u32 reg = 0, sctlr; > + unsigned long flags; > > if (terminate) > reg |= ARM_SMMU_RESUME_TERMINATE; > > + spin_lock_irqsave(&smmu_domain->cb_lock, flags); > + > arm_smmu_cb_write(smmu, cfg->cbndx, ARM_SMMU_CB_RESUME, reg); > + > + /* > + * Re-enable interrupts after they were disabled by > + * arm_smmu_context_fault(). > + */ > + sctlr = arm_smmu_cb_read(smmu, cfg->cbndx, ARM_SMMU_CB_SCTLR); > + sctlr |= ARM_SMMU_SCTLR_CFIE; > + arm_smmu_cb_write(smmu, cfg->cbndx, ARM_SMMU_CB_SCTLR, sctlr); > + > + spin_unlock_irqrestore(&smmu_domain->cb_lock, flags); > } > > #define QCOM_ADRENO_SMMU_GPU_SID 0 > diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c > index 498b96e95cb4fdb67c246ef13de1eb8f40d68f7d..284079ef95cd2deeb71816a284850523897badd8 100644 > --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c > +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c > @@ -466,13 +466,52 @@ static irqreturn_t arm_smmu_context_fault(int irq, void *dev) > if (!(cfi->fsr & ARM_SMMU_CB_FSR_FAULT)) > return IRQ_NONE; > > + /* > + * On some implementations FSR.SS asserts a context fault > + * interrupt. We do not want this behavior, because resolving the > + * original context fault typically requires operations that cannot be > + * performed in IRQ context but leaving the stall unacknowledged will > + * immediately lead to another spurious interrupt as FSR.SS is still > + * set. Work around this by disabling interrupts for this context bank. > + * It's expected that interrupts are re-enabled after resuming the > + * translation. > + * > + * We have to do this before report_iommu_fault() so that we don't > + * leave interrupts disabled in case the downstream user decides the > + * fault can be resolved inside its fault handler. > + * > + * There is a possible race if there are multiple context banks sharing > + * the same interrupt and both signal an interrupt in between writing > + * RESUME and SCTLR. We could disable interrupts here before we > + * re-enable them in the resume handler, leaving interrupts enabled. > + * Lock the write to serialize it with the resume handler. > + */ > + if (cfi->fsr & ARM_SMMU_CB_FSR_SS) { > + u32 val; > + > + spin_lock(&smmu_domain->cb_lock); > + val = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_SCTLR); > + val &= ~ARM_SMMU_SCTLR_CFIE; > + arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_SCTLR, val); > + spin_unlock(&smmu_domain->cb_lock); > + } > + > + /* > + * The SMMUv2 architecture specification says that if stall-on-fault is > + * enabled the correct sequence is to write to SMMU_CBn_FSR to clear > + * the fault and then write to SMMU_CBn_RESUME. Clear the interrupt > + * first before running the user's fault handler to make sure we follow > + * this sequence. It should be ok if there is another fault in the > + * meantime because we have already read the fault info. > + */ > + arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_FSR, cfi->fsr); > + > ret = report_iommu_fault(&smmu_domain->domain, NULL, cfi->iova, > cfi->fsynr0 & ARM_SMMU_CB_FSYNR0_WNR ? IOMMU_FAULT_WRITE : IOMMU_FAULT_READ); > > if (ret == -ENOSYS && __ratelimit(&rs)) > arm_smmu_print_context_fault_info(smmu, idx, cfi); > > - arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_FSR, cfi->fsr); > return IRQ_HANDLED; > } > > diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.h b/drivers/iommu/arm/arm-smmu/arm-smmu.h > index 411d807e0a7033833716635efb3968a0bd3ff237..4235b772c2cb032778816578c9e6644512543a5e 100644 > --- a/drivers/iommu/arm/arm-smmu/arm-smmu.h > +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.h > @@ -214,7 +214,6 @@ enum arm_smmu_cbar_type { > ARM_SMMU_CB_FSR_TLBLKF) > > #define ARM_SMMU_CB_FSR_FAULT (ARM_SMMU_CB_FSR_MULTI | \ > - ARM_SMMU_CB_FSR_SS | \ > ARM_SMMU_CB_FSR_UUT | \ > ARM_SMMU_CB_FSR_EF | \ > ARM_SMMU_CB_FSR_PF | \ > > -- > 2.47.1 >