On Tue, May 9, 2023 at 3:19 PM David Matlack <dmatlack@xxxxxxxxxx> wrote: > > On Wed, Apr 12, 2023 at 09:34:48PM +0000, Anish Moorthy wrote: > > Upon receiving an annotated EFAULT, userspace may take appropriate > > action to resolve the failed access. For instance, this might involve a > > UFFDIO_CONTINUE or MADV_POPULATE_WRITE in the context of uffd-based live > > migration postcopy. > > As implemented, I think it will be prohibitively expensive if not > impossible for userspace to determine why KVM is returning EFAULT when > KVM_CAP_ABSENT_MAPPING_FAULT is enabled, which means userspace can't > decide the correct action to take (try to resolve or bail). > > Consider the direct_map() case in patch in PATCH 15. The only way to hit > that condition is a logic bug in KVM or data corruption. There isn't > really anything userspace can do to handle this situation, and it has no > way to distinguish that from faults to due absent mappings. > > We could end up hitting cases where userspace loops forever doing > KVM_RUN, EFAULT, UFFDIO_CONTINUE/MADV_POPULATE_WRITE, KVM_RUN, EFAULT... > > Maybe we should just change direct_map() to use KVM_BUG() and return > something other than EFAULT. But the general problem still exists and > even if we have confidence in all the current EFAULT sites, we don't have > much protection against someone adding an EFAULT in the future that > userspace can't handle. Hmm, I had been operating under the assumption that userspace would always have been able to make the memory access succeed somehow- I (naively) didn't count on some guest memory access errors being unrecoverable. If that's the case, then we're back to needing some way to distinguish the new faults/exits emitted by user_mem_abort/kvm_faultin_pfn with the ABSENT_MAPPING_FAULT cap enabled :/ Let me paste in a bit of what Sean said to refute the idea of a special page-fault-failure set in those spots. (from https://lore.kernel.org/kvm/ZBoIzo8FGxSyUJ2I@xxxxxxxxxx/) On Tue, Mar 21, 2023 at 12:43 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > Setting a flag that essentially says "failure when handling a guest page fault" > is problematic on multiple fronts. Tying the ABI to KVM's internal implementation > is not an option, i.e. the ABI would need to be defined as "on page faults from > the guest". And then the resulting behavior would be non-deterministic, e.g. > userspace would see different behavior if KVM accessed a "bad" gfn via emulation > instead of in response to a guest page fault. And because of hardware TLBs, it > would even be possible for the behavior to be non-deterministic on the same > platform running the same guest code (though this would be exteremly unliklely > in practice). > > And even if userspace is ok with only handling guest page faults_today_, I highly > doubt that will hold forever. I.e. at some point there will be a use case that > wants to react to uaccess failures on fast-only memslots. > > Ignoring all of those issues, simplify flagging "this -EFAULT occurred when > handling a guest page fault" isn't precise enough for userspace to blindly resolve > the failure. Even if KVM went through the trouble of setting information if and > only if get_user_page_fast_only() failed while handling a guest page fault, > userspace would still need/want a way to verify that the failure was expected and > can be resolved, e.g. to guard against userspace bugs due to wrongly unmapping > or mprotecting a page. I wonder, how much of this problem comes down to my description/name (I suggested MEMFAULT_REASON_PAGE_FAULT_FAILURE) for the flag? I see Sean's concerns of the behavior issues when fast-only pages are accessed via guest mode or via emulation/uaccess. What if the description of the fast-only fault cap was tightened to something like "generates vcpu faults/exits in response to *EPT/SLAT violations* which cannot be mapped by present userspace page table entries?" I think that would eliminate the emulation/uaccess issues (though I may be wrong, so please let me know). Of course, by the time we get to kvm_faultin_pfn we don't know that we're faulting pages in response to an EPT violation... but if the idea makes sense then that might justify some plumbing code.