On Thu, May 09, 2024, Michael Roth wrote: > The intended logic when handling #NPFs with the RMP bit set (31) is to > first check to see if the #NPF requires a shared<->private transition > and, if so, to go ahead and let the corresponding KVM_EXIT_MEMORY_FAULT > get forwarded on to userspace before proceeding with any handling of > other potential RMP fault conditions like needing to PSMASH the RMP > entry/etc (which will be done later if the guest still re-faults after > the KVM_EXIT_MEMORY_FAULT is processed by userspace). > > The determination of whether any userspace handling of > KVM_EXIT_MEMORY_FAULT is needed is done by interpreting the return code > of kvm_mmu_page_fault(). However, the current code misinterprets the > return code, expecting 0 to indicate a userspace exit rather than less > than 0 (-EFAULT). This leads to the following unexpected behavior: > > - for KVM_EXIT_MEMORY_FAULTs resulting for implicit shared->private > conversions, warnings get printed from sev_handle_rmp_fault() > because it does not expect to be called for GPAs where > KVM_MEMORY_ATTRIBUTE_PRIVATE is not set. Standard linux guests don't > generally do this, but it is allowed and should be handled > similarly to private->shared conversions rather than triggering any > sort of warnings > > - if gmem support for 2MB folios is enabled (via currently out-of-tree > code), implicit shared<->private conversions will always result in > a PSMASH being attempted, even if it's not actually needed to > resolve the RMP fault. This doesn't cause any harm, but results in a > needless PSMASH and zapping of the sPTE > > Resolve these issues by calling sev_handle_rmp_fault() only when > kvm_mmu_page_fault()'s return code is greater than or equal to 0, > indicating a KVM_MEMORY_EXIT_FAULT/-EFAULT isn't needed. While here, > simplify the code slightly and fix up the associated comments for better > clarity. > > Fixes: ccc9d836c5c3 ("KVM: SEV: Add support to handle RMP nested page faults") > > Signed-off-by: Michael Roth <michael.roth@xxxxxxx> > --- > arch/x86/kvm/svm/svm.c | 10 ++++------ > 1 file changed, 4 insertions(+), 6 deletions(-) > > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c > index 426ad49325d7..9431ce74c7d4 100644 > --- a/arch/x86/kvm/svm/svm.c > +++ b/arch/x86/kvm/svm/svm.c > @@ -2070,14 +2070,12 @@ static int npf_interception(struct kvm_vcpu *vcpu) > svm->vmcb->control.insn_len); > > /* > - * rc == 0 indicates a userspace exit is needed to handle page > - * transitions, so do that first before updating the RMP table. > + * rc < 0 indicates a userspace exit may be needed to handle page > + * attribute updates, so deal with that first before handling other > + * potential RMP fault conditions. > */ > - if (error_code & PFERR_GUEST_RMP_MASK) { > - if (rc == 0) > - return rc; > + if (rc >= 0 && error_code & PFERR_GUEST_RMP_MASK) This isn't correct either. A return of '0' also indiciates "exit to userspace", it just doesn't happen with SNP because '0' is returned only when KVM attempts emulation, and that too gets short-circuited by svm_check_emulate_instruction(). And I would honestly drop the comment, KVM's less-than-pleasant 1/0/-errno return values overload is ubiquitous enough that it should be relatively self-explanatory. Or if you prefer to keep a comment, drop the part that specifically calls out attributes updates, because that incorrectly implies that's the _only_ reason why KVM checks the return. But my vote is to drop the comment, because it essentially becomes "don't proceed to step 2 if step 1 failed", which kind of makes the reader go "well, yeah".