Re: [PATCH v2 4/6] KVM: x86/mmu: Recover TDP MMU huge page mappings in-place instead of zapping

Sean Christopherson <seanjc@xxxxxxxxxx> · Wed, 30 Oct 2024 16:42:03 -0700

On Wed, Oct 09, 2024, Vipin Sharma wrote:
> On Wed, Oct 9, 2024 at 10:35 AM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
> >
> > On Wed, Oct 09, 2024, Vipin Sharma wrote:
> > > On Fri, Aug 23, 2024 at 4:57 PM David Matlack <dmatlack@xxxxxxxxxx> wrote:
> > > > +static u64 modify_spte_protections(u64 spte, u64 set, u64 clear)
> > > >  {
> > > >         bool is_access_track = is_access_track_spte(spte);
> > > >
> > > >         if (is_access_track)
> > > >                 spte = restore_acc_track_spte(spte);
> > > >
> > > > -       spte &= ~shadow_nx_mask;
> > > > -       spte |= shadow_x_mask;
> > > > +       spte = (spte | set) & ~clear;
> > >
> > > We should add a check here WARN_ON_ONCE(set & clear) because if both
> > > have a common bit set to 1 then the result  will be different between:
> > > 1. spte = (spt | set) & ~clear
> > > 2. spte = (spt | ~clear) & set
> > >
> > > In the current form, 'clear' has more authority in the final value of spte.
> >
> > KVM_MMU_WARN_ON(), overlapping @set and @clear is definitely something that should
> > be caught during development, i.e. we don't need to carry the WARN_ON_ONCE() in
> > production kernels
> >
> > > > +u64 make_huge_spte(struct kvm *kvm, u64 small_spte, int level)
> > > > +{
> > > > +       u64 huge_spte;
> > > > +
> > > > +       if (KVM_BUG_ON(!is_shadow_present_pte(small_spte), kvm))
> > > > +               return SHADOW_NONPRESENT_VALUE;
> > > > +
> > > > +       if (KVM_BUG_ON(level == PG_LEVEL_4K, kvm))
> > > > +               return SHADOW_NONPRESENT_VALUE;
> > > > +
> > >
> > > KVM_BUG_ON() is very aggressive. We should replace it with WARN_ON_ONCE()
> >
> > I'm tempted to say KVM_MMU_WARN_ON() here too.
> 
> I am fine with KVM_MMU_WARN_ON() here. Callers should check for the
> value they provided and returned from this API and if it's important
> to them in Production then decide on next steps accordingly.

Coming back to this, I opted to match the behavior of make_small_spte() and do:

	KVM_BUG_ON(!is_shadow_present_pte(small_spte) || level == PG_LEVEL_4K, kvm);

As explained in commit 3d4415ed75a57, the scenario is meant to be impossible.
If the check fails in production, odds are good there's SPTE memory corruption
and we _want_ to kill the VM.

    KVM: x86/mmu: Bug the VM if KVM tries to split a !hugepage SPTE

    Bug the VM instead of simply warning if KVM tries to split a SPTE that is
    non-present or not-huge.  KVM is guaranteed to end up in a broken state as
    the callers fully expect a valid SPTE, e.g. the shadow MMU will add an
    rmap entry, and all MMUs will account the expected small page.  Returning
    '0' is also technically wrong now that SHADOW_NONPRESENT_VALUE exists,
    i.e. would cause KVM to create a potential #VE SPTE.

    While it would be possible to have the callers gracefully handle failure,
    doing so would provide no practical value as the scenario really should be
    impossible, while the error handling would add a non-trivial amount of
    noise.

There's also no need to return SHADOW_NONPRESENT_VALUE.  KVM_BUG_ON() ensures
all vCPUs are kicked out of the guest, so while the return SPTE may be a bit
nonsensical, it will never be consumed by hardware.  Theoretically, KVM could
wander down a weird path in the future, but again, the most likely scenario is
that there was host memory corruption, so potential weird paths are the least of
KVM's worries at that point.

More importantly, in the _current_ code, returning SHADOW_NONPRESENT_VALUE happens
to be benign, but that's 100% due to make_huge_spte() only being used by the TDP
MMU.  If the shaduw MMU ever started using make_huge_spte(), returning a !present
SPTE would be all but guaranteed to cause fatal problems.