On Wed, May 29, 2019 at 01:55:48PM +0200, Marc Gonzalez wrote: > From: Robin Murphy <robin.murphy@xxxxxxx> > > Apparently, some Qualcomm arm64 platforms which appear to expose their > SMMU global register space are still, in fact, using a hypervisor to > mediate it by trapping and emulating register accesses. Sadly, some > deployed versions of said trapping code have bugs wherein they go > horribly wrong for stores using r31 (i.e. XZR/WZR) as the source > register. ^^^ This should be in the comment instead of "qcom bug". > While this can be mitigated for GCC today by tweaking the constraints > for the implementation of writel_relaxed(), to avoid any potential > arms race with future compilers more aggressively optimising register > allocation, the simple way is to just remove all the problematic > constant zeros. For the write-only TLB operations, the actual value is > irrelevant anyway and any old nearby variable will provide a suitable > GPR to encode. The one point at which we really do need a zero to clear > a context bank happens before any of the TLB maintenance where crashes > have been reported, so is apparently not a problem... :/ Hmm. It would be nice to understand this a little better. In which cases does XZR appear to work? > Reported-by: AngeloGioacchino Del Regno <kholk11@xxxxxxxxx> > Reviewed-by: Marc Gonzalez <marc.w.gonzalez@xxxxxxx> > Reviewed-by: Bjorn Andersson <bjorn.andersson@xxxxxxxxxx> > Tested-by: AngeloGioacchino Del Regno <kholk11@xxxxxxxxx> > Tested-by: Marc Gonzalez <marc.w.gonzalez@xxxxxxx> > Tested-by: Bjorn Andersson <bjorn.andersson@xxxxxxxxxx> > Signed-off-by: Robin Murphy <robin.murphy@xxxxxxx> > --- > Changes from v1: > - Tweak commit message (remove "compilers", s/hangs/crashes) > - Add a comment before writel_relaxed > --- > drivers/iommu/arm-smmu.c | 8 +++++--- > 1 file changed, 5 insertions(+), 3 deletions(-) > > diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c > index 5e54cc0a28b3..3f352268fa8b 100644 > --- a/drivers/iommu/arm-smmu.c > +++ b/drivers/iommu/arm-smmu.c > @@ -423,7 +423,8 @@ static void __arm_smmu_tlb_sync(struct arm_smmu_device *smmu, > { > unsigned int spin_cnt, delay; > > - writel_relaxed(0, sync); > + /* Write "garbage" (rather than 0) to work around a qcom bug */ > + writel_relaxed((unsigned long)sync, sync); > for (delay = 1; delay < TLB_LOOP_TIMEOUT; delay *= 2) { > for (spin_cnt = TLB_SPIN_COUNT; spin_cnt > 0; spin_cnt--) { > if (!(readl_relaxed(status) & sTLBGSTATUS_GSACTIVE)) > @@ -1763,8 +1764,9 @@ static void arm_smmu_device_reset(struct arm_smmu_device *smmu) > } > > /* Invalidate the TLB, just in case */ > - writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLH); > - writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH); > + /* Write "garbage" (rather than 0) to work around a qcom bug */ > + writel_relaxed(reg, gr0_base + ARM_SMMU_GR0_TLBIALLH); > + writel_relaxed(reg, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH); Any reason not to make these obviously dummy values e.g.: /* * Text from the commit message about broken hypervisor */ #define QCOM_DUMMY_VAL_NOT_XZR ~0U That makes the callsites much easier to understand and I doubt there's a performance impact from allocating an extra register here. Will