On Wed, Aug 14, 2024, Aneesh Kumar K.V wrote: > Sean Christopherson <seanjc@xxxxxxxxxx> writes: > > > On Mon, Aug 12, 2024, Aneesh Kumar K.V wrote: > >> Anish Moorthy <amoorthy@xxxxxxxxxx> writes: > >> > >> > Right now userspace just gets a bare EFAULT when the stage-2 fault > >> > handler fails to fault in the relevant page. Set up a > >> > KVM_EXIT_MEMORY_FAULT whenever this happens, which at the very least > >> > eases debugging and might also let userspace decide on/take some > >> > specific action other than crashing the VM. > >> > > >> > In some cases, user_mem_abort() EFAULTs before the size of the fault is > >> > calculated: return 0 in these cases to indicate that the fault is of > >> > unknown size. > >> > > >> > >> VMMs are now converting private memory to shared or vice-versa on vcpu > >> exit due to memory fault. This change will require VMM track each page's > >> private/shared state so that they can now handle an exit fault on a > >> shared memory where the fault happened due to reasons other than > >> conversion. > > > > I don't see how filling kvm_run.memory_fault in more locations changes anything. > > The userspace exits are inherently racy, e.g. userspace may have already converted > > the page to the appropriate state, thus making KVM's exit spurious. So either > > the VMM already tracks state, or the VMM blindly converts to shared/private. > > > > I might be missing some details here. The change is adding exit_reason = > KVM_EXIT_MEMORY_FAULT to code path which would earlier result in VMM > panics? > > For ex: > > @@ -1473,6 +1475,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, > if (unlikely(!vma)) { > kvm_err("Failed to find VMA for hva 0x%lx\n", hva); > mmap_read_unlock(current->mm); > + kvm_prepare_memory_fault_exit(vcpu, fault_ipa, 0, > + write_fault, exec_fault, false); > return -EFAULT; > } > > > VMMs handle this with code as below > > static bool handle_memoryfault(struct kvm_cpu *vcpu) > { > .... > return true; > } > > bool kvm_cpu__handle_exit(struct kvm_cpu *vcpu) > { > switch (vcpu->kvm_run->exit_reason) { > ... > case KVM_EXIT_MEMORY_FAULT: > return handle_memoryfault(vcpu); > } > > return false; > } > > and the caller did > > ret = kvm_cpu__handle_exit(cpu); > if (!ret) > goto panic_kvm; > break; > > > This change will break those VMMs isn't? ie, we will not panic after > this change? If the VMM unconditionally resumes the guest on errno=EFAULT, that's a VMM bug. handle_memoryfault() needs to have some amount of checking to verify that it can actually resolve the fault that was reported, given the gfn and metadata. In practice, that means panicking on any gfn that's not associated with a memslot that has KVM_MEM_GUEST_MEMFD, because prior to this series, it's impossible for userspace to resolve any faults besides implict shared<=>private conversions.