Re: [PATCH 06/16] KVM: arm64: Force a full unmap on vpcu reinit

Will Deacon <will@xxxxxxxxxx> · Fri, 30 Jul 2021 13:50:09 +0100

On Wed, Jul 28, 2021 at 11:38:35AM +0100, Marc Zyngier wrote:
> On Tue, 27 Jul 2021 19:11:33 +0100,
> Will Deacon <will@xxxxxxxxxx> wrote:
> > 
> > On Thu, Jul 15, 2021 at 05:31:49PM +0100, Marc Zyngier wrote:
> > > As we now keep information in the S2PT, we must be careful not
> > > to keep it across a VM reboot, which could otherwise lead to
> > > interesting problems.
> > > 
> > > Make sure that the S2 is completely discarded on reset of
> > > a vcpu, and remove the flag that enforces the MMIO check.
> > > 
> > > Signed-off-by: Marc Zyngier <maz@xxxxxxxxxx>
> > > ---
> > >  arch/arm64/kvm/arm.c | 8 +++++++-
> > >  1 file changed, 7 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> > > index 97ab1512c44f..b0d2225190d2 100644
> > > --- a/arch/arm64/kvm/arm.c
> > > +++ b/arch/arm64/kvm/arm.c
> > > @@ -1096,12 +1096,18 @@ static int kvm_arch_vcpu_ioctl_vcpu_init(struct kvm_vcpu *vcpu,
> > >  	 * ensuring that the data side is always coherent. We still
> > >  	 * need to invalidate the I-cache though, as FWB does *not*
> > >  	 * imply CTR_EL0.DIC.
> > > +	 *
> > > +	 * If the MMIO guard was enabled, we pay the price of a full
> > > +	 * unmap to get back to a sane state (and clear the flag).
> > >  	 */
> > >  	if (vcpu->arch.has_run_once) {
> > > -		if (!cpus_have_final_cap(ARM64_HAS_STAGE2_FWB))
> > > +		if (!cpus_have_final_cap(ARM64_HAS_STAGE2_FWB) ||
> > > +		    test_bit(KVM_ARCH_FLAG_MMIO_GUARD, &vcpu->kvm->arch.flags))
> > >  			stage2_unmap_vm(vcpu->kvm);
> > >  		else
> > >  			icache_inval_all_pou();
> > > +
> > > +		clear_bit(KVM_ARCH_FLAG_MMIO_GUARD, &vcpu->kvm->arch.flags);
> > 
> > What prevents this racing with another vCPU trying to set the bit?
> 
> Not much. We could take the kvm lock on both ends to serialize it, but
> that's pretty ugly. And should we care? What is the semantic of
> resetting a vcpu while another is still running?

It's definitely weird, but given that this is an attack vector I don't think
we can rule out attackers trying whacky stuff like this (although maybe
we end up forbidding vcpu reset in pKVM -- dunno).

> If we want to support this sort of behaviour, then our tracking is
> totally bogus, because it is VM-wide. And you don't even have to play
> with that bit from another vcpu: all the information is lost at the
> point where we unmap the S2 PTs.
> 
> Maybe an alternative is to move this to the halt/reboot PSCI handlers,
> making it clearer what we expect?

I think that's probably worth looking at. The race is quite hard to reason
about otherwise, so if clearing the bit can be done on the teardown path
in single-threaded context then I think that's better. It looks like
kvm_prepare_system_event() has all the synchronisation we need there.

Will