Re: [PATCH] KVM: arm64: Always invalidate TLB for stage-2 permission faults

Marc Zyngier <maz@xxxxxxxxxx> · Sun, 24 Sep 2023 11:12:30 +0100

On Sat, 23 Sep 2023 00:08:21 +0100,
Oliver Upton <oliver.upton@xxxxxxxxx> wrote:
> 
> On Fri, Sep 22, 2023 at 10:32:29PM +0000, Oliver Upton wrote:
> > It is possible for multiple vCPUs to fault on the same IPA and attempt
> > to resolve the fault. One of the page table walks will actually update
> > the PTE and the rest will return -EAGAIN per our race detection scheme.
> > KVM elides the TLB invalidation on the racing threads as the return
> > value is nonzero.
> > 
> > Before commit a12ab1378a88 ("KVM: arm64: Use local TLBI on permission
> > relaxation") KVM always used broadcast TLB invalidations when handling
> > permission faults, which had the convenient property of making the
> > stage-2 updates visible to all CPUs in the system. However now we do a
> > local invalidation, and TLBI elision leads to vCPUs getting stuck in a
> > permission fault loop. Remember that the architecture permits the TLB to
> > cache translations that precipitate a permission fault.
> 
> The effects of this are slightly overstated (got ahead of myself).
> EAGAIN only crops up if the cmpxchg() fails, we return 0 if the PTE
> didn't need to be updated.
> 
> On the subsequent permission fault we'll do the right thing and
> invalidate the TLB, so this change is purely an optimization rather than
> a correctness issue.

Can you measure the actual effect of this change? In my (limited)
experience, I had to actually trick the guest into doing this, and
opportunistically invalidating TLBs didn't have any significant
benefit.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.