On Wed, May 19, 2021, James Houghton wrote: > This patch has been written to support page-ins using userfaultfd's > SIGBUS feature. When a userfaultfd is created with UFFD_FEATURE_SIGBUS, > `handle_userfault` will return VM_FAULT_SIGBUS instead of putting the > calling thread to sleep. Normal (non-guest) threads that access memory > that has been registered with a UFFD_FEATURE_SIGBUS userfaultfd receive > a SIGBUS. > > When a vCPU gets an EPT page fault in a userfaultfd-registered region, > KVM calls into `handle_userfault` to resolve the page fault. With > UFFD_FEATURE_SIGBUS, VM_FAULT_SIGBUS is returned, but a SIGBUS is never > delivered to the userspace thread. This patch propagates the > VM_FAULT_SIGBUS error up to KVM, where we then send the signal. > > Upon receiving a VM_FAULT_SIGBUS, the KVM_RUN ioctl will exit to > userspace. This functionality already exists. I would strongly prefer to fix this in KVM by returning a KVM specific exit reason (instead of -EFAULT), with additional information provided in vcpu->run, e.g. address, etc... VirtioFS has (had?) a similar problem with a file being truncated in the host and the guest being killed as a result due to KVM returning -EFAULT without any useful information[*]. That idea never got picked up, but I'm 99% certain the solution would provide exactly the functionality you want. [*] https://lkml.kernel.org/r/20200617230052.GB27751@xxxxxxxxxxxxxxx Handling this purely in KVM would have several advantages: - No need to plumb @fault_error around mm/. KVM might be able to fudge this anyways by looking for -EFAULT, but then it would mess up SIGBUS vs SIGSEGV. - KVM can provide more relevant information then the signal path, e.g. guest RIP and GPA. This probably isn't useful for your use case, but for debug and other use cases it can be very helpful. - The error and its info are synchronous and delivered together (on exit to userspace), instead of being split across KVM and the signal handling. - This behavior needs to be opt-in to avoid breaking KVM's (awful) ABI, but we might be able to get away with squeezing the extra info into vcpu->run even if userspace doesn't opt-in (though that doesn't mean userspace will do anything with it). - I hate signal handling (ok, not a legitimate reason). The big downside is that implementing the synchronous reporting would need to either be done for every KVM architecture, or would need to touch every arch if done generically. I haven't looked at other architectures for this specific issue, so I don't know which of those routes would be least awful. A very incomplete patch for x86 would look something like: diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 0144c40d09c7..2d4d32425c49 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -2875,8 +2875,11 @@ static void kvm_send_hwpoison_signal(unsigned long address, struct task_struct * send_sig_mceerr(BUS_MCEERR_AR, (void __user *)address, PAGE_SHIFT, tsk); } -static int kvm_handle_bad_page(struct kvm_vcpu *vcpu, gfn_t gfn, kvm_pfn_t pfn) +static int kvm_handle_bad_page(struct kvm_vcpu *vcpu, gfn_t gfn, hva_t hva, + kvm_pfn_t pfn) { + struct kvm_mem_fault_exit *fault = &vcpu->run->mem_fault; + /* * Do not cache the mmio info caused by writing the readonly gfn * into the spte otherwise read access on readonly gfn also can @@ -2886,25 +2889,32 @@ static int kvm_handle_bad_page(struct kvm_vcpu *vcpu, gfn_t gfn, kvm_pfn_t pfn) return RET_PF_EMULATE; if (pfn == KVM_PFN_ERR_HWPOISON) { - kvm_send_hwpoison_signal(kvm_vcpu_gfn_to_hva(vcpu, gfn), current); + kvm_send_hwpoison_signal(hva, current); return RET_PF_RETRY; } + fault->userspace_address = hva; + fault->guest_physical_address = gpa; + fault->guest_rip = kvm_rip_read(vcpu); + + if (vcpu->kvm->arch.mem_fault_reporting_enabled) + return KVM_EXIT_MEM_FAULT; + return -EFAULT; } -static bool handle_abnormal_pfn(struct kvm_vcpu *vcpu, gva_t gva, gfn_t gfn, - kvm_pfn_t pfn, unsigned int access, +static bool handle_abnormal_pfn(struct kvm_vcpu *vcpu, gva_t gva, gpa_t gpa, + hva_t hva, kvm_pfn_t pfn, unsigned int access, int *ret_val) { /* The pfn is invalid, report the error! */ if (unlikely(is_error_pfn(pfn))) { - *ret_val = kvm_handle_bad_page(vcpu, gfn, pfn); + *ret_val = kvm_handle_bad_page(vcpu, gpa, hva, pfn); return true; } if (unlikely(is_noslot_pfn(pfn))) { - vcpu_cache_mmio_info(vcpu, gva, gfn, + vcpu_cache_mmio_info(vcpu, gva, gpa >> PAGE_SHIFT, access & shadow_mmio_access_mask); /* * If MMIO caching is disabled, emulate immediately without @@ -3746,7 +3756,7 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code, write, &map_writable)) return RET_PF_RETRY; - if (handle_abnormal_pfn(vcpu, is_tdp ? 0 : gpa, gfn, pfn, ACC_ALL, &r)) + if (handle_abnormal_pfn(vcpu, is_tdp ? 0 : gpa, gpa, hva, pfn, ACC_ALL, &r)) return r; r = RET_PF_RETRY;