On Wed, Oct 30, 2024 at 4:42 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > On Wed, Oct 09, 2024, Vipin Sharma wrote: > > Coming back to this, I opted to match the behavior of make_small_spte() and do: > > KVM_BUG_ON(!is_shadow_present_pte(small_spte) || level == PG_LEVEL_4K, kvm); > Should these be two separate KVM_BUG_ON(), to aid in debugging? > As explained in commit 3d4415ed75a57, the scenario is meant to be impossible. > If the check fails in production, odds are good there's SPTE memory corruption > and we _want_ to kill the VM. > > KVM: x86/mmu: Bug the VM if KVM tries to split a !hugepage SPTE > > Bug the VM instead of simply warning if KVM tries to split a SPTE that is > non-present or not-huge. KVM is guaranteed to end up in a broken state as > the callers fully expect a valid SPTE, e.g. the shadow MMU will add an > rmap entry, and all MMUs will account the expected small page. Returning > '0' is also technically wrong now that SHADOW_NONPRESENT_VALUE exists, > i.e. would cause KVM to create a potential #VE SPTE. > > While it would be possible to have the callers gracefully handle failure, > doing so would provide no practical value as the scenario really should be > impossible, while the error handling would add a non-trivial amount of > noise. > > There's also no need to return SHADOW_NONPRESENT_VALUE. KVM_BUG_ON() ensures > all vCPUs are kicked out of the guest, so while the return SPTE may be a bit > nonsensical, it will never be consumed by hardware. Theoretically, KVM could > wander down a weird path in the future, but again, the most likely scenario is > that there was host memory corruption, so potential weird paths are the least of > KVM's worries at that point. > > More importantly, in the _current_ code, returning SHADOW_NONPRESENT_VALUE happens > to be benign, but that's 100% due to make_huge_spte() only being used by the TDP > MMU. If the shaduw MMU ever started using make_huge_spte(), returning a !present > SPTE would be all but guaranteed to cause fatal problems. I think the caller should be given the opportunity to handle a failure. In the current code, TDP is able to handle the error condition, so penalizing a VM seems wrong. We have gone from a state of reduced performance to either very good performance or VM being killed. If shadow MMU starts using make_huge_spte() and doesn't add logic to handle this scenario (killing vm or something else) then that is a coding bug of that feature which should be fixed.