On Fri, Mar 08, 2024 at 03:22:50PM -0800, Sean Christopherson wrote: > On Fri, Mar 08, 2024, James Gowans wrote: > > However, memfd_secret doesn’t work out the box for KVM guest memory; the > > main reason seems to be that the GUP path is intentionally disabled for > > memfd_secret, so if we use a memfd_secret backed VMA for a memslot then > > KVM is not able to fault the memory in. If it’s been pre-faulted in by > > userspace then it seems to work. > > Huh, that _shouldn't_ work. The folio_is_secretmem() in gup_pte_range() is > supposed to prevent the "fast gup" path from getting secretmem pages. I suspect this works because KVM only calls gup on faults and if the memory was pre-faulted via memfd_secret there won't be faults and no gups from KVM. > > With this in mind, what’s the best way to solve getting guest RAM out of > > the direct map? Is memfd_secret integration with KVM the way to go, or > > should we build a solution on top of guest_memfd, for example via some > > flag that causes it to leave memory in the host userspace’s page tables, > > but removes it from the direct map? > > memfd_secret obviously gets you a PoC much faster, but in the long term I'm quite > sure you'll be fighting memfd_secret all the way. E.g. it's not dumpable, it > deliberately allocates at 4KiB granularity (though I suspect the bug you found > means that it can be inadvertantly mapped with 2MiB hugepages), it has no line > of sight to taking userspace out of the equation, etc. > > With guest_memfd on the other hand, everyone contributing to and maintaining it > has goals that are *very* closely aligned with what you want to do. I agree with Sean, guest_memfd seems a better interface to use. It's integrated by design with KVM and removing guest memory from the direct map looks like a natural enhancement to guest_memfd. Unless I'm missing something, for fast-and-dirty POC it'll be a oneliner that adds set_memory_np() to kvm_gmem_get_folio() and then figuring out what to do with virtio :) -- Sincerely yours, Mike.