On Mon, 03 Jun 2024 18:28:56 +0100, Oliver Upton <oliver.upton@xxxxxxxxx> wrote: > > Hey, > > On Mon, Jun 03, 2024 at 01:36:54PM +0100, Marc Zyngier wrote: > > [...] > > > > + /* > > > + * Layer the guest hypervisor's trap configuration on top of our own if > > > + * we're in a nested context. > > > + */ > > > + if (!vcpu_has_nv(vcpu) || is_hyp_ctxt(vcpu)) > > > + goto write; > > > + > > > + if (guest_hyp_fpsimd_traps_enabled(vcpu)) > > > + val &= ~CPACR_ELx_FPEN; > > > + if (guest_hyp_sve_traps_enabled(vcpu)) > > > + val &= ~CPACR_ELx_ZEN; > > > > I'm afraid this isn't quite right. You are clearing both FPEN (resp > > ZEN) bits based on any of the two bits being clear, while what we want > > is to actually propagate the 0 bits (and only those). > > An earlier version of the series I had was effectively doing this, > applying the L0 trap configuration on top of L1's CPTR_EL2. Unless I'm > missing something terribly obvious, I think this is still correct, as: > > - If we're in a hyp context, vEL2's CPTR_EL2 is loaded into CPACR_EL1. > The independent EL0/EL1 enable bits are handled by hardware. All this > junk gets skipped and we go directly to writing CPTR_EL2. Yup. > > - If we are not in a hyp context, vEL2's CPTR_EL2 gets folded into the > hardware value for CPTR_EL2. TGE must be 0 in this case, so there is > no conditional trap based on what EL the vCPU is in. There's only two > functional trap states at this point, hence the all-or-nothing > approach. Ah, I see it now. Only bit[0] of each 2-bit field matters in that case. This thing is giving me a headache. > > > What I have in my tree is something along the lines of: > > > > cptr = vcpu_sanitised_cptr_el2(vcpu); > > tmp = cptr & (CPACR_ELx_ZEN_MASK | CPACR_ELx_FPEN_MASK); > > val &= ~(tmp ^ (CPACR_ELx_ZEN_MASK | CPACR_ELx_FPEN_MASK)); > > My hesitation with this is it gives the impression that both trap bits > are significant, but in reality only the LSB is useful. Unless my > understanding is disastrously wrong, of course :) No, you are absolutely right. Although you *are* clearing both bits anyway ;-). > > Anyway, my _slight_ preference is towards keeping what I have if > possible, with a giant comment explaining the reasoning behind it. But I > can take your approach instead too. I think the only arguments for my own solution are: - slightly better codegen (no function call or inlining), and a smaller .text section in switch.o, because the helpers are not cheap: LLVM: 0 .text 00003ef8 (guest_hyp_*_traps_enabled) 0 .text 00003d48 (bit ops) GCC: 0 .text 00002624 (guest_hyp_*_traps_enabled) 0 .text 000024b4 (bit ops) Yes, LLVM is an absolute pig because of BTI... - tracking the guest's bits more precisely may make it easier to debug but these are pretty weak arguments, and I don't really care either way at this precise moment. Thanks, M. -- Without deviation from the norm, progress is not possible.