Re: [PATCH v3 15/21] KVM: arm64: Support FEAT_FPMR for guests

Marc Zyngier <maz@xxxxxxxxxx> · Thu, 07 Dec 2023 14:06:34 +0000

On Thu, 07 Dec 2023 12:30:45 +0000,
Mark Brown <broonie@xxxxxxxxxx> wrote:
> 
> On Thu, Dec 07, 2023 at 08:39:46AM +0000, Marc Zyngier wrote:
> > Mark Brown <broonie@xxxxxxxxxx> wrote:
> 
> > >  #define HCRX_GUEST_FLAGS \
> > > -	(HCRX_EL2_SMPME | HCRX_EL2_TCR2En | \
> > > +	(HCRX_EL2_SMPME | HCRX_EL2_TCR2En | HCRX_EL2_EnFPM | \
> 
> > We really should start making all of these things conditional. See
> > below.
> 
> Is there an overarching theory behind how these things are intended to
> work?  I agree with you that I'd have expected more conditionality here,
> I was trying to fit in with the existing pattern.  It's kind of hard to
> follow what the intention is, I think to some extent due to things
> having evolved over time.

The intention is that *everything* becomes conditional, based on both
the host CPU support and the feature set advertised to the guest.
Which means that we don't stuff that isn't advertised to the guest,
and instead make these things UNDEF. Just like on any real CPU
implementation.

>
> > > @@ -517,7 +519,6 @@ struct kvm_vcpu_arch {
> > >  	enum fp_type fp_type;
> > >  	unsigned int sve_max_vl;
> > >  	u64 svcr;
> > > -	u64 fpmr;
> 
> > Why do this change here? Why isn't done like that the first place?
> 
> It didn't seem right to add the register to struct vcpu_sysreg before it
> was handled by KVM.  As referenced in the cover letter normally this
> wouldn't come up because KVM doesn't rely on the host kernel for
> managing register state so we add KVM support then enable the host
> kernel but for FPSIMD we're reusing fpsimd_save() so we need the host
> kernel support to be in place when we enable KVM.

That doesn't explain why you can't be upfront with it and populate the
FPMR entry. In either case, you are wasting a u64.

> 
> > >  	CGT_MDCR_TDE,
> > > @@ -279,6 +281,12 @@ static const struct trap_bits coarse_trap_bits[] = {
> > >  		.mask		= HCR_TTLBOS,
> > >  		.behaviour	= BEHAVE_FORWARD_ANY,
> > >  	},
> > > +	[CGT_HCRX_EnFPM] = {
> > > +		.index		= HCRX_EL2,
> > > +		.value		= HCRX_EL2_EnFPM,
> > > +		.mask		= HCRX_EL2_EnFPM,
> > > +		.behaviour	= BEHAVE_FORWARD_ANY,
> 
> > This looks wrong. HCRX_EL2.EnFPM is an enable bit.
> 
> Right, it's the wrong way round.
> 
> > > +static void *fpsimd_share_end(struct user_fpsimd_state *fpsimd)
> > > +{
> > > +	void *share_end = fpsimd + 1;
> > > +
> > > +	if (cpus_have_final_cap(ARM64_HAS_FPMR))
> > > +		share_end += sizeof(u64);
> > > +
> > > +	return share_end;
> > > +}
> 
> > This is horrible. Why can't you just have a new structure wrapping
> > both user_fpsimd_state and fpmr? This is going to break in subtle
> > ways, just like the SVE/SME stuff.
> 
> I agree that it's not great, the main issue was that fpsimd_state is
> both already embedded in uw for hardened usercopy and very widely
> referenced by exactly which struct it's in so I was taking a guess as to
> what would get the least objections.  The obvious thing would be to add
> FPMR to uw and share the whole thing with the hypervisor, if people
> don't mind adding another field to uw I could do that?

Either that, or you create a KVM-specific structure that contains
these fields. If that results in KVM changes, so be it. But I won't
take this sort of pointer arithmetic that assumes some pre-defined
layout.

> 
> > >  	vcpu->arch.host_fpsimd_state = kern_hyp_va(fpsimd);
> > > +	if (cpus_have_final_cap(ARM64_HAS_FPMR)) {
> > > +		WARN_ON_ONCE(&current->thread.fpmr + 1 != fpsimd_share_end(fpsimd));
> 
> > How can this happen?
> 
> It shouldn't, but it'd be bad if it did so I put a check in to make sure
> we haven't messed up.

See my earlier point: you shouldn't have to check if you used a data
structure.

> 
> > > +		vcpu->arch.host_fpmr = kern_hyp_va(&current->thread.fpmr);
> > > +	}
> 
> > We really need to stop piling the save/restore of stuff that isn't
> > advertised to the guest.
> 
> I'm not clear what you're referencing here?  The feature is advertised
> to the guest via the ID registers and in the past you've pushed back on
> making things where the state is just a single register like this
> optional.  Do you mean that we should be making this conditional on the
> guest ID registers?  If that is the case is there a plan for how that's
> supposed to work, set flags when kvm_vcpu_run_pid_change() happens for
> example?

See the beginning of this email. It is high time that we stop enabling
everything by default, because this totally breaks VM migration. We
already have a huge backlog of these things, and I don't want to add
more of it.

Which means that at the very least, enabling *any* feature also comes
with sanitising the state one way or another when this feature is
disabled by userspace.

How this is being done is still a work in progress: my current plan is
based on a set of trap bits that are computed on a per-VM basis, and
some side state that indicates whether the trap handling is for
emulation or feature disabling purpose. This will probably reuse the
NV infrastructure which has an exhaustive list of the sysregs that can
be trapped from EL0/EL1.

At the very least, userspace shouldn't be able to observe the state
that a guest isn't supposed to generate, and we should be mindful of
not creating covert channels.

	M.

-- 
Without deviation from the norm, progress is not possible.