This patchset is also available at: https://github.com/amdese/linux/commits/snp-prepare-thp-rfc1 and is based on top of Paolo's kvm-coco-queue-2024-11 tag which includes a snapshot of his patches[1] to provide tracking of whether or not sub-pages of a huge folio need to have kvm_arch_gmem_prepare() hooks issued before guest access: d55475f23cea KVM: gmem: track preparedness a page at a time 64b46ca6cd6d KVM: gmem: limit hole-punching to ranges within the file 17df70a5ea65 KVM: gmem: add a complete set of functions to query page preparedness e3449f6841ef KVM: gmem: allocate private data for the gmem inode [1] https://lore.kernel.org/lkml/20241108155056.332412-1-pbonzini@xxxxxxxxxx/ This series addresses some of the pending review comments for those patches (feel free to squash/rework as-needed), and implements a first real user in the form of a reworked version of Sean's original 2MB THP support for gmem. It is still a bit up in the air as to whether or not gmem should support THP at all rather than moving straight to 2MB/1GB hugepages in the form of something like HugeTLB folios[2] or the lower-level PFN range allocator presented by Yu Zhao during the guest_memfd call last week. The main arguments against THP, as I understand it, is that THPs will become split over time due to hole-punching and rarely have an opportunity to get rebuilt due to lack of memory migration support for current CoCo hypervisor implementations like SNP (and adding the migration support to resolve that not necessarily resulting in a net-gain performance-wise). The current plan for SNP, as discussed during the first guest_memfd call, is to implement something similar to 2MB HugeTLB, and disallow hole-punching at sub-2MB granularity. However, there have also been some discussions during recent PUCK calls where the KVM maintainers have some still expressed some interest in pulling in gmem THP support in a more official capacity. The thinking there is that hole-punching is a userspace policy, and that it could in theory avoid holepunching for sub-2MB GFN ranges to avoid degradation over time. And if there's a desire to enforce this from the kernel-side by blocking sub-2MB hole-punching from the host-side, this would provide similar semantics/behavior to the 2MB HugeTLB-like approach above. So maybe there is still some room for discussion about these approaches. Outside that, there are a number of other development areas where it would be useful to at least have some experimental 2MB support in place so that those efforts can be pursued in parallel, such as the preparedness tracking touched on here, and exploring how that will intersect with other development areas like using gmem for both shared and private memory, mmap support, guest_memfd library, etc., so my hopes are that this approach could be useful for that purpose at least, even if only as an out-of-tree stop-gap. Thoughts/comments welcome! [2] https://lore.kernel.org/all/cover.1728684491.git.ackerleytng@xxxxxxxxxx/ Testing ------- Currently, this series does not default to enabling 2M support, but it can instead be switched on/off dynamically via a module parameter: echo 1 >/sys/module/kvm/parameters/gmem_2m_enabled echo 0 >/sys/module/kvm/parameters/gmem_2m_enabled This can be useful for simulating things like host pressure where we start getting a mix of 4K/2MB allocations. I've used this to help test that the preparedness-tracking still handles things properly in these situations. But if we do decide to pull in THP support upstream it would make more sense to drop the parameter completely. ---------------------------------------------------------------- Michael Roth (4): KVM: gmem: Don't rely on __kvm_gmem_get_pfn() for preparedness KVM: gmem: Don't clear pages that have already been prepared KVM: gmem: Hold filemap invalidate lock while allocating/preparing folios KVM: SEV: Improve handling of large ranges in gmem prepare callback Sean Christopherson (1): KVM: Add hugepage support for dedicated guest memory arch/x86/kvm/svm/sev.c | 163 ++++++++++++++++++++++++++------------------ include/linux/kvm_host.h | 2 + virt/kvm/guest_memfd.c | 173 ++++++++++++++++++++++++++++++++++------------- virt/kvm/kvm_main.c | 4 ++ 4 files changed, 228 insertions(+), 114 deletions(-)