On Mon, Mar 20, 2023 at 8:53 AM Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > On Fri, Mar 17, 2023, Anish Moorthy wrote: > > On Fri, Mar 17, 2023 at 2:50 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > > I wonder if we can get away with returning -EFAULT, but still filling vcpu->run > > > with KVM_EXIT_MEMORY_FAULT and all the other metadata. That would likely simplify > > > the implementation greatly, and would let KVM fill vcpu->run unconditonally. KVM > > > would still need a capability to advertise support to userspace, but userspace > > > wouldn't need to opt in. I think this may have been my very original though, and > > > I just never actually wrote it down... > > > > Oh, good to know that's actually an option. I thought of that too, but > > assumed that returning a negative error code was a no-go for a proper > > vCPU exit. But if that's not true then I think it's the obvious > > solution because it precludes any uncaught behavior-change bugs. > > > > A couple of notes > > 1. Since we'll likely miss some -EFAULT returns, we'll need to make > > sure that the user can check for / doesn't see a stale > > kvm_run::memory_fault field when a missed -EFAULT makes it to > > userspace. It's a small and easy-to-fix detail, but I thought I'd > > point it out. > > Ya, this is the main concern for me as well. I'm not as confident that it's > easy-to-fix/avoid though. > > > 2. I don't think this would simplify the series that much, since we > > still need to find the call sites returning -EFAULT to userspace and > > populate memory_fault only in those spots to avoid populating it for > > -EFAULTs which don't make it to userspace. > > Filling kvm_run::memory_fault even if KVM never exits to userspace is perfectly > ok. It's not ideal, but it's ok. > > > We *could* relax that condition and just document that memory_fault should be > > ignored when KVM_RUN does not return -EFAULT... but I don't think that's a > > good solution from a coder/maintainer perspective. > > You've got things backward. memory_fault _must_ be ignored if KVM doesn't return > the associated "magic combo", where the magic value is either "0+KVM_EXIT_MEMORY_FAULT" > or "-EFAULT+KVM_EXIT_MEMORY_FAULT". > > Filling kvm_run::memory_fault but not exiting to userspace is ok because userspace > never sees the data, i.e. userspace is completely unaware. This behavior is not > ideal from a KVM perspective as allowing KVM to fill the kvm_run union without > exiting to userspace can lead to other bugs, e.g. effective corruption of the > kvm_run union, but at least from a uABI perspective, the behavior is acceptable. Actually, I don't think the idea of filling in kvm_run.memory_fault for -EFAULTs which don't make it to userspace works at all. Consider the direct_map function, which bubbles its -EFAULT to kvm_mmu_do_page_fault. kvm_mmu_do_page_fault is called from both kvm_arch_async_page_ready (which ignores the return value), and by kvm_mmu_page_fault (where the return value does make it to userspace). Populating kvm_run.memory_fault anywhere in or under kvm_mmu_do_page_fault seems an immediate no-go, because a wayward kvm_arch_async_page_ready could (presumably) overwrite/corrupt an already-set kvm_run.memory_fault / other kvm_run field. That in turn looks problematic for the memory-fault-exit-on-fast-gup-failure part of this series, because there are at least a couple of cases for which kvm_mmu_do_page_fault will -EFAULT. One is the early-efault-on-fast-gup-failure case which was the original purpose of this series. Another is a -EFAULT from FNAME(fetch) (passed up through FNAME(page_fault)). There might be other cases as well. But unless userspace can/should resolve *all* such -EFAULTs in the same manner, a kvm_run.memory_fault populated in "kvm_mmu_page_fault" wouldn't be actionable. At least, not without a whole lot of plumbing code to make it so. Sean, am I missing anything here?