For hardware-protected VMs like SEV-SNP guests, certain conditions like attempting to perform a write to a page which is not in the state that the guest expects it to be in can result in a nested/extended #PF which can only be satisfied by the host performing an implicit page state change to transition the page into the expected shared/private state. This is generally handled by generating a KVM_EXIT_MEMORY_FAULT event that gets forwarded to userspace to handle via KVM_SET_MEMORY_ATTRIBUTES. However, the fast_page_fault() code might misconstrue this situation as being the result of a write-protected access, and treat it as a spurious case when it sees that writes are already allowed for the sPTE. This results in the KVM MMU trying to resume the guest rather than taking any action to satisfy the real source of the #PF such as generating a KVM_EXIT_MEMORY_FAULT, resulting in the guest spinning on nested #PFs. For now, just skip the fast path for hardware-protected VMs since they don't currently utilize any of this access-tracking machinery anyway. In the future, these considerations will need to be taken into account if there's any need/desire to re-enable the fast path for hardware-protected VMs. Since software-protected VMs don't have a notion of a shared vs. private that's separate from what KVM is tracking, the above KVM_EXIT_MEMORY_FAULT condition wouldn't occur, so avoid the special handling for that case for now. Cc: Isaku Yamahata <isaku.yamahata@xxxxxxxxx> Signed-off-by: Michael Roth <michael.roth@xxxxxxx> --- arch/x86/kvm/mmu/mmu.c | 30 ++++++++++++++++++++++++++++-- 1 file changed, 28 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 62ad38b2a8c9..cecd8360378f 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -3296,7 +3296,7 @@ static int kvm_handle_noslot_fault(struct kvm_vcpu *vcpu, return RET_PF_CONTINUE; } -static bool page_fault_can_be_fast(struct kvm_page_fault *fault) +static bool page_fault_can_be_fast(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) { /* * Page faults with reserved bits set, i.e. faults on MMIO SPTEs, only @@ -3307,6 +3307,32 @@ static bool page_fault_can_be_fast(struct kvm_page_fault *fault) if (fault->rsvd) return false; + /* + * For hardware-protected VMs, certain conditions like attempting to + * perform a write to a page which is not in the state that the guest + * expects it to be in can result in a nested/extended #PF. In this + * case, the below code might misconstrue this situation as being the + * result of a write-protected access, and treat it as a spurious case + * rather than taking any action to satisfy the real source of the #PF + * such as generating a KVM_EXIT_MEMORY_FAULT. This can lead to the + * guest spinning on a #PF indefinitely. + * + * For now, just skip the fast path for hardware-protected VMs since + * they don't currently utilize any of this machinery anyway. In the + * future, these considerations will need to be taken into account if + * there's any need/desire to re-enable the fast path for + * hardware-protected VMs. + * + * Since software-protected VMs don't have a notion of a shared vs. + * private that's separate from what KVM is tracking, the above + * KVM_EXIT_MEMORY_FAULT condition wouldn't occur, so avoid the + * special handling for that case for now. + */ + if (kvm_slot_can_be_private(fault->slot) && + !(IS_ENABLED(CONFIG_KVM_SW_PROTECTED_VM) && + vcpu->kvm->arch.vm_type == KVM_X86_SW_PROTECTED_VM)) + return false; + /* * #PF can be fast if: * @@ -3407,7 +3433,7 @@ static int fast_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) u64 *sptep; uint retry_count = 0; - if (!page_fault_can_be_fast(fault)) + if (!page_fault_can_be_fast(vcpu, fault)) return ret; walk_shadow_page_lockless_begin(vcpu); -- 2.25.1