Re: [PATCH v2 4/6] KVM: x86/mmu: Recover TDP MMU huge page mappings in-place instead of zapping

Vipin Sharma <vipinsh@xxxxxxxxxx> · Fri, 1 Nov 2024 13:37:56 -0700

On Wed, Oct 30, 2024 at 4:42 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
>
> On Wed, Oct 09, 2024, Vipin Sharma wrote:
>
> Coming back to this, I opted to match the behavior of make_small_spte() and do:
>
>         KVM_BUG_ON(!is_shadow_present_pte(small_spte) || level == PG_LEVEL_4K, kvm);
>

Should these be two separate KVM_BUG_ON(), to aid in debugging?

> As explained in commit 3d4415ed75a57, the scenario is meant to be impossible.
> If the check fails in production, odds are good there's SPTE memory corruption
> and we _want_ to kill the VM.
>
>     KVM: x86/mmu: Bug the VM if KVM tries to split a !hugepage SPTE
>
>     Bug the VM instead of simply warning if KVM tries to split a SPTE that is
>     non-present or not-huge.  KVM is guaranteed to end up in a broken state as
>     the callers fully expect a valid SPTE, e.g. the shadow MMU will add an
>     rmap entry, and all MMUs will account the expected small page.  Returning
>     '0' is also technically wrong now that SHADOW_NONPRESENT_VALUE exists,
>     i.e. would cause KVM to create a potential #VE SPTE.
>
>     While it would be possible to have the callers gracefully handle failure,
>     doing so would provide no practical value as the scenario really should be
>     impossible, while the error handling would add a non-trivial amount of
>     noise.
>
> There's also no need to return SHADOW_NONPRESENT_VALUE.  KVM_BUG_ON() ensures
> all vCPUs are kicked out of the guest, so while the return SPTE may be a bit
> nonsensical, it will never be consumed by hardware.  Theoretically, KVM could
> wander down a weird path in the future, but again, the most likely scenario is
> that there was host memory corruption, so potential weird paths are the least of
> KVM's worries at that point.
>
> More importantly, in the _current_ code, returning SHADOW_NONPRESENT_VALUE happens
> to be benign, but that's 100% due to make_huge_spte() only being used by the TDP
> MMU.  If the shaduw MMU ever started using make_huge_spte(), returning a !present
> SPTE would be all but guaranteed to cause fatal problems.

I think the caller should be given the opportunity to handle a
failure. In the current code, TDP is able to handle the error
condition, so penalizing a VM seems wrong. We have gone from a state
of reduced performance to either very good performance or VM being
killed.

If shadow MMU starts using make_huge_spte() and doesn't add logic to
handle this scenario (killing vm or something else) then that is a
coding bug of that feature which should be fixed.