On Thu, May 09, 2024, Michael Roth wrote: > --- > arch/x86/kvm/mmu/mmu.c | 30 ++++++++++++++++++++++++++++-- > 1 file changed, 28 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > index 62ad38b2a8c9..cecd8360378f 100644 > --- a/arch/x86/kvm/mmu/mmu.c > +++ b/arch/x86/kvm/mmu/mmu.c > @@ -3296,7 +3296,7 @@ static int kvm_handle_noslot_fault(struct kvm_vcpu *vcpu, > return RET_PF_CONTINUE; > } > > -static bool page_fault_can_be_fast(struct kvm_page_fault *fault) > +static bool page_fault_can_be_fast(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) > { > /* > * Page faults with reserved bits set, i.e. faults on MMIO SPTEs, only > @@ -3307,6 +3307,32 @@ static bool page_fault_can_be_fast(struct kvm_page_fault *fault) > if (fault->rsvd) > return false; > > + /* > + * For hardware-protected VMs, certain conditions like attempting to > + * perform a write to a page which is not in the state that the guest > + * expects it to be in can result in a nested/extended #PF. In this > + * case, the below code might misconstrue this situation as being the > + * result of a write-protected access, and treat it as a spurious case > + * rather than taking any action to satisfy the real source of the #PF > + * such as generating a KVM_EXIT_MEMORY_FAULT. This can lead to the > + * guest spinning on a #PF indefinitely. > + * > + * For now, just skip the fast path for hardware-protected VMs since > + * they don't currently utilize any of this machinery anyway. In the > + * future, these considerations will need to be taken into account if > + * there's any need/desire to re-enable the fast path for > + * hardware-protected VMs. > + * > + * Since software-protected VMs don't have a notion of a shared vs. > + * private that's separate from what KVM is tracking, the above > + * KVM_EXIT_MEMORY_FAULT condition wouldn't occur, so avoid the > + * special handling for that case for now. Very technically, it can occur if userspace _just_ modified the attributes. And as I've said multiple times, at least for now, I want to avoid special casing SW-protected VMs unless it is *absolutely* necessary, because their sole purpose is to allow testing flows that are impossible to excercise without SNP/TDX hardware. > + */ > + if (kvm_slot_can_be_private(fault->slot) && > + !(IS_ENABLED(CONFIG_KVM_SW_PROTECTED_VM) && > + vcpu->kvm->arch.vm_type == KVM_X86_SW_PROTECTED_VM)) Heh, !(x && y) kills me, I misread this like 4 times. Anyways, I don't like the heuristic. It doesn't tie the restriction back to the cause in any reasonable way. Can't this simply be? if (fault->is_private != kvm_mem_is_private(vcpu->kvm, fault->gfn) return false; Which is much, much more self-explanatory.