On Tue, Dec 6, 2022 at 12:41 PM James Houghton <jthoughton@xxxxxxxxxx> wrote: > On Tue, Dec 6, 2022 at 1:01 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > Can you elaborate on what makes it better? Or maybe generate a list of pros and > > cons? I can think of (dis)advantages for both approaches, but I haven't identified > > anything that would be a blocking issue for either approach. Doesn't mean there > > isn't one or more blocking issues, just that I haven't thought of any :-) > > Let's see.... so using no-slow-GUP over no UFFD waiting: > - No need to take mmap_lock in mem fault path. > - Change the relevant __gfn_to_pfn_memslot callers > (kvm_faultin_pfn/user_mem_abort/others?) to set `atomic = true` if the > new CAP is used. > - No need for a new PF_NO_UFFD_WAIT (would be toggled somewhere > in/near kvm_faultin_pfn/user_mem_abort). > - Userspace has to indirectly figure out the state of the page tables > to know what action to take (which introduces some weirdness, like if > anyone MADV_DONTNEEDs some guest memory, we need to know). I'm no expert but I believe a guest access to MADV_DONTNEED'd GFN would just cause a new page to be allocated by the kernel. So I think userspace can still blindly do MADV_POPULATE_WRITE in this case. Were there any other scenarios you had in mind? > - While userfaultfd is registered (so like during post-copy), any > hva_to_pfn() calls that were resolvable with slow GUP before (without > dropping into handle_userfault()) will now need to be resolved by > userspace manually with a call to MADV_POPULATE_WRITE. This extra trip > to userspace could slow things down. Is there any way to enable fast-gup to identify when a PTE is not present due to userfaultfd specifically without taking the mmap_lock (e.g. using an unused bit in the PTE)? Then we could avoid extra trips to userspace for MADV_POPULATE_WRITE. > > Both of these seem pretty simple to implement in the kernel; the most > complicated part is just returning KVM_EXIT_MEMORY_FAULT in more > places / for other architectures (I care about x86 and arm64). > > Right now both approaches seem fine to me. Not having to take the > mmap_lock in the fault path, while being such a minor difference now, > could be a huge benefit if we can later get around to making > UFFDIO_CONTINUE not need the mmap lock. Disregarding that, not > requiring userspace to guess the state of the page tables seems > helpful (less bug-prone, I guess). > > > > > > When KVM_RUN exits: > > > - If we haven't UFFDIO_CONTINUE'd yet, do that now and restart KVM_RUN. > > > - If we have, then something bad has happened. Slow GUP already ran > > > and failed, so we need to treat this in the same way we treat a > > > MADV_POPULATE_WRITE failure above: userspace might just want to crash > > > (or inject a memory error or something). > > > > > > - James