Re: [RESEND RFC PATCH v1 2/5] arm64: Add BBM Level 2 cpu feature

Will Deacon <will@xxxxxxxxxx> · Fri, 3 Jan 2025 15:35:13 +0000

On Thu, Jan 02, 2025 at 12:30:34PM +0000, Marc Zyngier wrote:
> On Thu, 02 Jan 2025 12:07:04 +0000,
> Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx> wrote:
> > On Thu, 19 Dec 2024 16:45:28 +0000
> > Will Deacon <will@xxxxxxxxxx> wrote:
> > > On Thu, Dec 12, 2024 at 04:03:52PM +0000, Ryan Roberts wrote:
> > > > >>> If anything, this should absolutely check for FAR_EL1 and assert that
> > > > >>> this is indeed caused by such change.  
> > > > >>
> > > > >> I'm not really sure how we would check this reliably? Without patch 5, the
> > > > >> problem is somewhat constrained; we could have as many changes in flight as
> > > > >> there are CPUs so we could keep a list of all the {mm_struct, VA-range} that are
> > > > >> being modified. But if patch 5 is confirmed to be architecturally sound, then
> > > > >> there is no "terminating tlbi" so there is no bound on the set of {mm_struct,
> > > > >> VA-range}'s that could legitimately cause a conflict abort.  
> > > > > 
> > > > > I didn't mean to imply that we should identify the exact cause of the
> > > > > abort. I was hoping to simply check that FAR_EL1 reports a userspace
> > > > > VA. Why wouldn't that work?  
> > > > 
> > > > Ahh gottya! Yes agreed, this sounds like the right approach.  
> > > 
> > > Please, can we just not bother handling conflict aborts at all outside of
> > > KVM? This is all dead code, it's complicated and it doesn't scale to the
> > > in-kernel use-cases that others want. There's also not been any attempt
> > > to add the pKVM support for handling host-side conflict aborts from what
> > > I can tell.
> > > 
> > > For now, I would suggest limiting this series just to the KVM support
> > > for handling a broken/malicious guest. If the contpte performance
> > > improvements are worthwhile (I've asked for data), then let's add support
> > > for the CPUs that handle the conflict in hardware (I believe this is far
> > > more common than reporting the abort) so that the in-kernel users can
> > > benefit whilst keeping the code manageable at the same time.
> > > 
> > 
> > Given direction the discussion is going in time to raise a hand.
> > 
> > Huawei has implementations that support BBML2, and might report TLB conflict
> > abort after changing block size directly until an appropriate TLB invalidation
> > instruction completes and this Implementation Choice is architecturally compliant.
> 
> Compliant, absolutely. That's the letter of the spec. The usefulness
> aspect is, however, more debatable, and this is what Will is pointing
> out.
> 
> Dealing with TLB Conflict aborts is an absolute pain if you need
> to handle it within the same Translation Regime and using the same
> TTBR as the one that has generated the fault. So at least for the time
> being, it might be preferable to only worry about the implementations
> that will promise to never generate such an abort and quietly perform
> an invalidation behind the kernel's back.

Agreed. We're not dropping support for CPUs that don't give us what we'd
like here, we're just not bending over to port and maintain new
optimisations for them. I think that's a reasonable compromise?

That said, thanks for raising this, Jonathan. It's a useful data point
to know that TLB conflict aborts exist in the wild!

Will