Re: [PATCH 10/11] KVM: arm64: nv: Honor guest hypervisor's FP/SVE traps in CPTR_EL2

Marc Zyngier <maz@xxxxxxxxxx> · Tue, 04 Jun 2024 12:14:42 +0100

On Mon, 03 Jun 2024 18:28:56 +0100,
Oliver Upton <oliver.upton@xxxxxxxxx> wrote:
> 
> Hey,
> 
> On Mon, Jun 03, 2024 at 01:36:54PM +0100, Marc Zyngier wrote:
> 
> [...]
> 
> > > +	/*
> > > +	 * Layer the guest hypervisor's trap configuration on top of our own if
> > > +	 * we're in a nested context.
> > > +	 */
> > > +	if (!vcpu_has_nv(vcpu) || is_hyp_ctxt(vcpu))
> > > +		goto write;
> > > +
> > > +	if (guest_hyp_fpsimd_traps_enabled(vcpu))
> > > +		val &= ~CPACR_ELx_FPEN;
> > > +	if (guest_hyp_sve_traps_enabled(vcpu))
> > > +		val &= ~CPACR_ELx_ZEN;
> > 
> > I'm afraid this isn't quite right. You are clearing both FPEN (resp
> > ZEN) bits based on any of the two bits being clear, while what we want
> > is to actually propagate the 0 bits (and only those).
> 
> An earlier version of the series I had was effectively doing this,
> applying the L0 trap configuration on top of L1's CPTR_EL2. Unless I'm
> missing something terribly obvious, I think this is still correct, as:
> 
>  - If we're in a hyp context, vEL2's CPTR_EL2 is loaded into CPACR_EL1.
>    The independent EL0/EL1 enable bits are handled by hardware. All this
>    junk gets skipped and we go directly to writing CPTR_EL2.

Yup.

> 
>  - If we are not in a hyp context, vEL2's CPTR_EL2 gets folded into the
>    hardware value for CPTR_EL2. TGE must be 0 in this case, so there is
>    no conditional trap based on what EL the vCPU is in. There's only two
>    functional trap states at this point, hence the all-or-nothing
>    approach.

Ah, I see it now. Only bit[0] of each 2-bit field matters in that
case. This thing is giving me a headache.

> 
> > What I have in my tree is something along the lines of:
> > 
> > 	cptr = vcpu_sanitised_cptr_el2(vcpu);
> > 	tmp = cptr & (CPACR_ELx_ZEN_MASK | CPACR_ELx_FPEN_MASK);
> > 	val &= ~(tmp ^ (CPACR_ELx_ZEN_MASK | CPACR_ELx_FPEN_MASK));
> 
> My hesitation with this is it gives the impression that both trap bits
> are significant, but in reality only the LSB is useful. Unless my
> understanding is disastrously wrong, of course :)

No, you are absolutely right. Although you *are* clearing both bits
anyway ;-).

> 
> Anyway, my _slight_ preference is towards keeping what I have if
> possible, with a giant comment explaining the reasoning behind it. But I
> can take your approach instead too.

I think the only arguments for my own solution are:

- slightly better codegen (no function call or inlining), and a
  smaller .text section in switch.o, because the helpers are not
  cheap:

  LLVM:

	0 .text         00003ef8 (guest_hyp_*_traps_enabled)
	0 .text         00003d48 (bit ops)

  GCC:
	0 .text         00002624 (guest_hyp_*_traps_enabled)
	0 .text         000024b4 (bit ops)

  Yes, LLVM is an absolute pig because of BTI...

- tracking the guest's bits more precisely may make it easier to debug

but these are pretty weak arguments, and I don't really care either
way at this precise moment.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.