[PATCH RFC v1 0/5] KVM: gmem: 2MB THP support and preparedness tracking changes

Michael Roth <michael.roth@xxxxxxx> · Thu, 12 Dec 2024 00:36:30 -0600

This patchset is also available at:

  https://github.com/amdese/linux/commits/snp-prepare-thp-rfc1

and is based on top of Paolo's kvm-coco-queue-2024-11 tag which includes
a snapshot of his patches[1] to provide tracking of whether or not
sub-pages of a huge folio need to have kvm_arch_gmem_prepare() hooks issued
before guest access:

  d55475f23cea KVM: gmem: track preparedness a page at a time
  64b46ca6cd6d KVM: gmem: limit hole-punching to ranges within the file
  17df70a5ea65 KVM: gmem: add a complete set of functions to query page preparedness
  e3449f6841ef KVM: gmem: allocate private data for the gmem inode 

  [1] https://lore.kernel.org/lkml/20241108155056.332412-1-pbonzini@xxxxxxxxxx/

This series addresses some of the pending review comments for those patches
(feel free to squash/rework as-needed), and implements a first real user in
the form of a reworked version of Sean's original 2MB THP support for gmem.

It is still a bit up in the air as to whether or not gmem should support
THP at all rather than moving straight to 2MB/1GB hugepages in the form of
something like HugeTLB folios[2] or the lower-level PFN range allocator
presented by Yu Zhao during the guest_memfd call last week. The main
arguments against THP, as I understand it, is that THPs will become
split over time due to hole-punching and rarely have an opportunity to get 
rebuilt due to lack of memory migration support for current CoCo hypervisor
implementations like SNP (and adding the migration support to resolve that
not necessarily resulting in a net-gain performance-wise). The current
plan for SNP, as discussed during the first guest_memfd call, is to
implement something similar to 2MB HugeTLB, and disallow hole-punching
at sub-2MB granularity.

However, there have also been some discussions during recent PUCK calls
where the KVM maintainers have some still expressed some interest in pulling
in gmem THP support in a more official capacity. The thinking there is that
hole-punching is a userspace policy, and that it could in theory avoid
holepunching for sub-2MB GFN ranges to avoid degradation over time.
And if there's a desire to enforce this from the kernel-side by blocking
sub-2MB hole-punching from the host-side, this would provide similar
semantics/behavior to the 2MB HugeTLB-like approach above.

So maybe there is still some room for discussion about these approaches.

Outside that, there are a number of other development areas where it would
be useful to at least have some experimental 2MB support in place so that
those efforts can be pursued in parallel, such as the preparedness
tracking touched on here, and exploring how that will intersect with other
development areas like using gmem for both shared and private memory, mmap
support, guest_memfd library, etc., so my hopes are that this approach
could be useful for that purpose at least, even if only as an out-of-tree
stop-gap.

Thoughts/comments welcome!

[2] https://lore.kernel.org/all/cover.1728684491.git.ackerleytng@xxxxxxxxxx/

Testing
-------

Currently, this series does not default to enabling 2M support, but it
can instead be switched on/off dynamically via a module parameter:

  echo 1 >/sys/module/kvm/parameters/gmem_2m_enabled
  echo 0 >/sys/module/kvm/parameters/gmem_2m_enabled

This can be useful for simulating things like host pressure where we start
getting a mix of 4K/2MB allocations. I've used this to help test that the
preparedness-tracking still handles things properly in these situations.

But if we do decide to pull in THP support upstream it would make more
sense to drop the parameter completely.

----------------------------------------------------------------
Michael Roth (4):
      KVM: gmem: Don't rely on __kvm_gmem_get_pfn() for preparedness
      KVM: gmem: Don't clear pages that have already been prepared
      KVM: gmem: Hold filemap invalidate lock while allocating/preparing folios
      KVM: SEV: Improve handling of large ranges in gmem prepare callback

Sean Christopherson (1):
      KVM: Add hugepage support for dedicated guest memory

 arch/x86/kvm/svm/sev.c   | 163 ++++++++++++++++++++++++++------------------
 include/linux/kvm_host.h |   2 +
 virt/kvm/guest_memfd.c   | 173 ++++++++++++++++++++++++++++++++++-------------
 virt/kvm/kvm_main.c      |   4 ++
 4 files changed, 228 insertions(+), 114 deletions(-)