On Fri, Jan 22, 2021 at 12:53:17PM +0000, Robin Murphy wrote: > On 2021-01-22 12:41, Will Deacon wrote: > >On Tue, Nov 24, 2020 at 12:15:58PM -0700, Jordan Crouse wrote: > >>Call report_iommu_fault() to allow upper-level drivers to register their > >>own fault handlers. > >> > >>Signed-off-by: Jordan Crouse <jcrouse@xxxxxxxxxxxxxx> > >>--- > >> > >> drivers/iommu/arm/arm-smmu/arm-smmu.c | 16 +++++++++++++--- > >> 1 file changed, 13 insertions(+), 3 deletions(-) > >> > >>diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c > >>index 0f28a8614da3..7fd18bbda8f5 100644 > >>--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c > >>+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c > >>@@ -427,6 +427,7 @@ static irqreturn_t arm_smmu_context_fault(int irq, void *dev) > >> struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); > >> struct arm_smmu_device *smmu = smmu_domain->smmu; > >> int idx = smmu_domain->cfg.cbndx; > >>+ int ret; > >> fsr = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSR); > >> if (!(fsr & ARM_SMMU_FSR_FAULT)) > >>@@ -436,11 +437,20 @@ static irqreturn_t arm_smmu_context_fault(int irq, void *dev) > >> iova = arm_smmu_cb_readq(smmu, idx, ARM_SMMU_CB_FAR); > >> cbfrsynra = arm_smmu_gr1_read(smmu, ARM_SMMU_GR1_CBFRSYNRA(idx)); > >>- dev_err_ratelimited(smmu->dev, > >>- "Unhandled context fault: fsr=0x%x, iova=0x%08lx, fsynr=0x%x, cbfrsynra=0x%x, cb=%d\n", > >>+ ret = report_iommu_fault(domain, dev, iova, > >>+ fsynr & ARM_SMMU_FSYNR0_WNR ? IOMMU_FAULT_WRITE : IOMMU_FAULT_READ); > >>+ > >>+ if (ret == -ENOSYS) > >>+ dev_err_ratelimited(smmu->dev, > >>+ "Unhandled context fault: fsr=0x%x, iova=0x%08lx, fsynr=0x%x, cbfrsynra=0x%x, cb=%d\n", > >> fsr, iova, fsynr, cbfrsynra, idx); > >>- arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_FSR, fsr); > >>+ /* > >>+ * If the iommu fault returns an error (except -ENOSYS) then assume that > >>+ * they will handle resuming on their own > >>+ */ > >>+ if (!ret || ret == -ENOSYS) > >>+ arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_FSR, fsr); > > > >Hmm, I don't grok this part. If the fault handler returned an error and > >we don't clear the FSR, won't we just re-take the irq immediately? > > If we don't touch the FSR at all, yes. Even if we clear the fault indicator > bits, the interrupt *might* remain asserted until a stalled transaction is > actually resolved - that's that lovely IMP-DEF corner. > > Robin. > This is for stall-on-fault. The idea is that if the developer chooses to do so we would stall the GPU after a fault long enough to take a picture of it with devcoredump and then release the FSR. Since we can't take the devcoredump from the interrupt handler we schedule it in a worker and then return an error to let the main handler know that we'll come back around clear the FSR later when we are done. It is assumed that we'll have to turn off interrupts in our handler to allow this to work. Its all very implementation specific, but then again we're assuming that if you want to do this then you know what you are doing. In that spirit the error that skips the FSR should probably be something specific instead of "all errors" - that way a well meaning handler that returns a -EINVAL doesn't accidentally break itself. Jordan > >I think > >it would be better to do this unconditionally, and print the "Unhandled > >context fault" message for any non-zero value of ret. > > > >Will > > -- The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project