Re: [PATCH v4 3/5] iommu/arm-smmu: Fix spurious interrupts with stall-on-fault

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Mar 4, 2025 at 8:57 AM Connor Abbott <cwabbott0@xxxxxxxxx> wrote:
>
> On some SMMUv2 implementations, including MMU-500, SMMU_CBn_FSR.SS
> asserts an interrupt. The only way to clear that bit is to resume the
> transaction by writing SMMU_CBn_RESUME, but typically resuming the
> transaction requires complex operations (copying in pages, etc.) that
> can't be done in IRQ context. drm/msm already has a problem, because
> its fault handler sometimes schedules a job to dump the GPU state and
> doesn't resume translation until this is complete.
>
> Work around this by disabling context fault interrupts until after the
> transaction is resumed. Because other context banks can share an IRQ
> line, we may still get an interrupt intended for another context bank,
> but in this case only SMMU_CBn_FSR.SS will be asserted and we can skip
> it assuming that interrupts are disabled which is accomplished by
> removing the bit from ARM_SMMU_CB_FSR_FAULT. SMMU_CBn_FSR.SS won't be
> asserted unless an external user enabled stall-on-fault, and they are
> expected to resume the translation and re-enable interrupts.
>
> Signed-off-by: Connor Abbott <cwabbott0@xxxxxxxxx>
> Reviewed-by Robin Murphy <robin.murphy@xxxxxxx>

Reviewed-by: Rob Clark <robdclark@xxxxxxxxx>

> ---
>  drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 15 ++++++++++-
>  drivers/iommu/arm/arm-smmu/arm-smmu.c      | 41 +++++++++++++++++++++++++++++-
>  drivers/iommu/arm/arm-smmu/arm-smmu.h      |  1 -
>  3 files changed, 54 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> index 186d6ad4fd1c990398df4dec53f4d58ada9e658c..a428e53add08d451fb2152e3ab80e0fba936e214 100644
> --- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> @@ -90,12 +90,25 @@ static void qcom_adreno_smmu_resume_translation(const void *cookie, bool termina
>         struct arm_smmu_domain *smmu_domain = (void *)cookie;
>         struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
>         struct arm_smmu_device *smmu = smmu_domain->smmu;
> -       u32 reg = 0;
> +       u32 reg = 0, sctlr;
> +       unsigned long flags;
>
>         if (terminate)
>                 reg |= ARM_SMMU_RESUME_TERMINATE;
>
> +       spin_lock_irqsave(&smmu_domain->cb_lock, flags);
> +
>         arm_smmu_cb_write(smmu, cfg->cbndx, ARM_SMMU_CB_RESUME, reg);
> +
> +       /*
> +        * Re-enable interrupts after they were disabled by
> +        * arm_smmu_context_fault().
> +        */
> +       sctlr = arm_smmu_cb_read(smmu, cfg->cbndx, ARM_SMMU_CB_SCTLR);
> +       sctlr |= ARM_SMMU_SCTLR_CFIE;
> +       arm_smmu_cb_write(smmu, cfg->cbndx, ARM_SMMU_CB_SCTLR, sctlr);
> +
> +       spin_unlock_irqrestore(&smmu_domain->cb_lock, flags);
>  }
>
>  #define QCOM_ADRENO_SMMU_GPU_SID 0
> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> index 498b96e95cb4fdb67c246ef13de1eb8f40d68f7d..284079ef95cd2deeb71816a284850523897badd8 100644
> --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> @@ -466,13 +466,52 @@ static irqreturn_t arm_smmu_context_fault(int irq, void *dev)
>         if (!(cfi->fsr & ARM_SMMU_CB_FSR_FAULT))
>                 return IRQ_NONE;
>
> +       /*
> +        * On some implementations FSR.SS asserts a context fault
> +        * interrupt. We do not want this behavior, because resolving the
> +        * original context fault typically requires operations that cannot be
> +        * performed in IRQ context but leaving the stall unacknowledged will
> +        * immediately lead to another spurious interrupt as FSR.SS is still
> +        * set. Work around this by disabling interrupts for this context bank.
> +        * It's expected that interrupts are re-enabled after resuming the
> +        * translation.
> +        *
> +        * We have to do this before report_iommu_fault() so that we don't
> +        * leave interrupts disabled in case the downstream user decides the
> +        * fault can be resolved inside its fault handler.
> +        *
> +        * There is a possible race if there are multiple context banks sharing
> +        * the same interrupt and both signal an interrupt in between writing
> +        * RESUME and SCTLR. We could disable interrupts here before we
> +        * re-enable them in the resume handler, leaving interrupts enabled.
> +        * Lock the write to serialize it with the resume handler.
> +        */
> +       if (cfi->fsr & ARM_SMMU_CB_FSR_SS) {
> +               u32 val;
> +
> +               spin_lock(&smmu_domain->cb_lock);
> +               val = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_SCTLR);
> +               val &= ~ARM_SMMU_SCTLR_CFIE;
> +               arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_SCTLR, val);
> +               spin_unlock(&smmu_domain->cb_lock);
> +       }
> +
> +       /*
> +        * The SMMUv2 architecture specification says that if stall-on-fault is
> +        * enabled the correct sequence is to write to SMMU_CBn_FSR to clear
> +        * the fault and then write to SMMU_CBn_RESUME. Clear the interrupt
> +        * first before running the user's fault handler to make sure we follow
> +        * this sequence. It should be ok if there is another fault in the
> +        * meantime because we have already read the fault info.
> +        */
> +       arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_FSR, cfi->fsr);
> +
>         ret = report_iommu_fault(&smmu_domain->domain, NULL, cfi->iova,
>                 cfi->fsynr0 & ARM_SMMU_CB_FSYNR0_WNR ? IOMMU_FAULT_WRITE : IOMMU_FAULT_READ);
>
>         if (ret == -ENOSYS && __ratelimit(&rs))
>                 arm_smmu_print_context_fault_info(smmu, idx, cfi);
>
> -       arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_FSR, cfi->fsr);
>         return IRQ_HANDLED;
>  }
>
> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.h b/drivers/iommu/arm/arm-smmu/arm-smmu.h
> index 411d807e0a7033833716635efb3968a0bd3ff237..4235b772c2cb032778816578c9e6644512543a5e 100644
> --- a/drivers/iommu/arm/arm-smmu/arm-smmu.h
> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.h
> @@ -214,7 +214,6 @@ enum arm_smmu_cbar_type {
>                                          ARM_SMMU_CB_FSR_TLBLKF)
>
>  #define ARM_SMMU_CB_FSR_FAULT          (ARM_SMMU_CB_FSR_MULTI |        \
> -                                        ARM_SMMU_CB_FSR_SS |           \
>                                          ARM_SMMU_CB_FSR_UUT |          \
>                                          ARM_SMMU_CB_FSR_EF |           \
>                                          ARM_SMMU_CB_FSR_PF |           \
>
> --
> 2.47.1
>





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [Linux for Sparc]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux