Hi Rob, >> > Although it's not really Linux's job to save hardware integrators from >> > their own misfortune, it *is* our job to stop userspace (e.g. VFIO >> > clients) from hosing the system for everybody else, even if they might >> > already be required to have elevated privileges. >> > >> > Given that the fault handling code currently executes entirely in IRQ >> > context, there is nothing that can sensibly be done to recover from >> > things like page faults anyway, so let's rip this code out for now and >> > avoid the potential for deadlock. >> >> Hi Will, >> >> so, I'd like to re-introduce this feature, I *guess* as some sort of >> opt-in quirk (ie. disabled by default unless something in DT tells you >> otherwise?? But I'm open to suggestions. I'm not entirely sure what >> hw was having problems due to this feature.) >> >> On newer snapdragon devices we are using arm-smmu for the GPU, and >> halting the GPU so the driver's fault handler can dump some GPU state >> on faults is enormously helpful for debugging and tracking down where >> in the gpu cmdstream the fault was triggered. In addition, we will >> eventually want the ability to update pagetables from fault handler >> and resuming the faulting transition. >> >> Some additional comments below.. >> >> > Cc: <stable@xxxxxxxxxxxxxxx> >> > Reported-by: Matt Evans <matt.evans@xxxxxxx> >> > Signed-off-by: Will Deacon <will.deacon@xxxxxxx> >> > --- >> > drivers/iommu/arm-smmu.c | 34 +++++++--------------------------- >> > 1 file changed, 7 insertions(+), 27 deletions(-) >> > >> > diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c >> > index 4f49fe29f202..2db74ebc3240 100644 >> > --- a/drivers/iommu/arm-smmu.c >> > +++ b/drivers/iommu/arm-smmu.c >> > @@ -686,8 +686,7 @@ static struct iommu_gather_ops arm_smmu_gather_ops = { >> > >> > static irqreturn_t arm_smmu_context_fault(int irq, void *dev) >> > { >> > - int flags, ret; >> > - u32 fsr, fsynr, resume; >> > + u32 fsr, fsynr; >> > unsigned long iova; >> > struct iommu_domain *domain = dev; >> > struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); >> > @@ -701,34 +700,15 @@ static irqreturn_t arm_smmu_context_fault(int irq, void *dev) >> > if (!(fsr & FSR_FAULT)) >> > return IRQ_NONE; >> > >> > - if (fsr & FSR_IGN) >> > - dev_err_ratelimited(smmu->dev, >> > - "Unexpected context fault (fsr 0x%x)\n", >> > - fsr); >> > - >> > fsynr = readl_relaxed(cb_base + ARM_SMMU_CB_FSYNR0); >> > - flags = fsynr & FSYNR0_WNR ? IOMMU_FAULT_WRITE : IOMMU_FAULT_READ; >> > - >> > iova = readq_relaxed(cb_base + ARM_SMMU_CB_FAR); >> > - if (!report_iommu_fault(domain, smmu->dev, iova, flags)) { >> > - ret = IRQ_HANDLED; >> > - resume = RESUME_RETRY; >> > - } else { >> > - dev_err_ratelimited(smmu->dev, >> > - "Unhandled context fault: iova=0x%08lx, fsynr=0x%x, cb=%d\n", >> > - iova, fsynr, cfg->cbndx); >> >> I would like to decouple this dev_err_ratelimit() print from the >> RESUME_RETRY vs RESUME_TERMINATE behaviour. I need the ability to >> indicate by return from my fault handler whether to resume or >> terminate. But I already have my own ratelimted prints and would >> prefer not to spam dmesg twice. >> >> I'm thinking about report_iommu_fault() returning: >> >> 0 => RESUME_RETRY >> -EFAULT => RESUME_TERMINATE but don't print >> anything else (or specifically -ENOSYS?) => RESUME_TERMINATE and print >> >> thoughts? >> >> > - ret = IRQ_NONE; >> > - resume = RESUME_TERMINATE; >> > - } >> > - >> > - /* Clear the faulting FSR */ >> > - writel(fsr, cb_base + ARM_SMMU_CB_FSR); >> > >> > - /* Retry or terminate any stalled transactions */ >> > - if (fsr & FSR_SS) >> > - writel_relaxed(resume, cb_base + ARM_SMMU_CB_RESUME); >> >> This might be a bug in qcom's implementation of the smmu spec, but >> seems like we don't have SS bit set, yet we still require RESUME reg >> to be written, otherwise gpu is perma-wedged. Maybe topic for a >> separate quirk? I'm not sure if writing RESUME reg on other hw when >> SS bit is not set is likely to cause problems? If not I suppose we >> could just unconditionally write it. >> >> Anyways, I'm not super-familiar w/ arm-smmu so suggestions welcome.. >> in between debugging freedreno I'll try to put together some patches. > >From what I can tell we need SCTLR_CFCFG to make the stall happen otherwise >the operation just gets terminated immediately and *then* we get notification >but by then the system keeps going. > Yes thats right, SCTLR_CFCFG was removed as a part of this patch. >I think SCTLR_HUPCF helps control that behavior (i.e. we don't go off faulting >through eternity) but I don't know how it works. > >From my very unlearned understanding I think we do want to set CFCFG and then >stall and let the interrupt handler decide to retry/terminate. Yes, infact i was thinking of adding this as a new patch, but now that this was a reverted one. As you said, it would be good to know the hw which was having problem with this and probably having this has a quirk otherwise. Also i see that FSR_SS is implemented by the qcom and the downstream code uses it. Regards, Sricharan -- To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html