On Tue, Jun 06, 2023 at 07:03:45PM +0000, Ackerley Tng <ackerleytng@xxxxxxxxxx> wrote: > Hello, > > This patchset builds upon a soon-to-be-published WIP patchset that Sean > published at https://github.com/sean-jc/linux/tree/x86/kvm_gmem_solo, mentioned > at [1]. > > The tree can be found at: > https://github.com/googleprodkernel/linux-cc/tree/gmem-hugetlb-rfc-v1 > > In this patchset, hugetlb support for KVM's guest_mem (aka gmem) is introduced, > allowing VM private memory (for confidential computing) to be backed by hugetlb > pages. > > guest_mem provides userspace with a handle, with which userspace can allocate > and deallocate memory for confidential VMs without mapping the memory into > userspace. > > Why use hugetlb instead of introducing a new allocator, like gmem does for 4K > and transparent hugepages? > > + hugetlb provides the following useful functionality, which would otherwise > have to be reimplemented: > + Allocation of hugetlb pages at boot time, including > + Parsing of kernel boot parameters to configure hugetlb > + Tracking of usage in hstate > + gmem will share the same system-wide pool of hugetlb pages, so users > don't have to have separate pools for hugetlb and gmem > + Page accounting with subpools > + hugetlb pages are tracked in subpools, which gmem uses to reserve > pages from the global hstate > + Memory charging > + hugetlb provides code that charges memory to cgroups > + Reporting: hugetlb usage and availability are available at /proc/meminfo, > etc > > The first 11 patches in this patchset is a series of refactoring to decouple > hugetlb and hugetlbfs. > > The central thread binding the refactoring is that some functions (like > inode_resv_map(), inode_subpool(), inode_hstate(), etc) rely on a hugetlbfs > concept, that the resv_map, subpool, hstate, are in a specific field in a > hugetlb inode. > > Refactoring to parametrize functions by hstate, subpool, resv_map will allow > hugetlb to be used by gmem and in other places where these data structures > aren't necessarily stored in the same positions in the inode. > > The refactoring proposed here is just the minimum required to get a > proof-of-concept working with gmem. I would like to get opinions on this > approach before doing further refactoring. (See TODOs) > > TODOs: > > + hugetlb/hugetlbfs refactoring > + remove_inode_hugepages() no longer needs to be exposed, it is hugetlbfs > specific and used only in inode.c > + remove_mapping_hugepages(), remove_inode_single_folio(), > hugetlb_unreserve_pages() shouldn't need to take inode as a parameter > + Updating inode->i_blocks can be refactored to a separate function and > called from hugetlbfs and gmem > + alloc_hugetlb_folio_from_subpool() shouldn't need to be parametrized by > vma > + hugetlb_reserve_pages() should be refactored to be symmetric with > hugetlb_unreserve_pages() > + It should be parametrized by resv_map > + alloc_hugetlb_folio_from_subpool() could perhaps use > hugetlb_reserve_pages()? > + gmem > + Figure out if resv_map should be used by gmem at all > + Probably needs more refactoring to decouple resv_map from hugetlb > functions Hi. If kvm gmem is compiled as kernel module, many symbols are failed to link. You need to add EXPORT_SYMBOL{,_GPL} for exported symbols. Or compile it to kernel instead of module? Thanks, > Questions for the community: > > 1. In this patchset, every gmem file backed with hugetlb is given a new > subpool. Is that desirable? > + In hugetlbfs, a subpool always belongs to a mount, and hugetlbfs has one > mount per hugetlb size (2M, 1G, etc) > + memfd_create(MFD_HUGETLB) effectively returns a full hugetlbfs file, so it > (rightfully) uses the hugetlbfs kernel mounts and their subpools > + I gave each file a subpool mostly to speed up implementation and still be > able to reserve hugetlb pages from the global hstate based on the gmem > file size. > + gmem, unlike hugetlbfs, isn't meant to be a full filesystem, so > + Should there be multiple mounts, one for each hugetlb size? > + Will the mounts be initialized on boot or on first gmem file creation? > + Or is one subpool per gmem file fine? > 2. Should resv_map be used for gmem at all, since gmem doesn't allow userspace > reservations? > > [1] https://lore.kernel.org/lkml/ZEM5Zq8oo+xnApW9@xxxxxxxxxx/ > > --- > > Ackerley Tng (19): > mm: hugetlb: Expose get_hstate_idx() > mm: hugetlb: Move and expose hugetlbfs_zero_partial_page > mm: hugetlb: Expose remove_inode_hugepages > mm: hugetlb: Decouple hstate, subpool from inode > mm: hugetlb: Allow alloc_hugetlb_folio() to be parametrized by subpool > and hstate > mm: hugetlb: Provide hugetlb_filemap_add_folio() > mm: hugetlb: Refactor vma_*_reservation functions > mm: hugetlb: Refactor restore_reserve_on_error > mm: hugetlb: Use restore_reserve_on_error directly in filesystems > mm: hugetlb: Parametrize alloc_hugetlb_folio_from_subpool() by > resv_map > mm: hugetlb: Parametrize hugetlb functions by resv_map > mm: truncate: Expose preparation steps for truncate_inode_pages_final > KVM: guest_mem: Refactor kvm_gmem fd creation to be in layers > KVM: guest_mem: Refactor cleanup to separate inode and file cleanup > KVM: guest_mem: hugetlb: initialization and cleanup > KVM: guest_mem: hugetlb: allocate and truncate from hugetlb > KVM: selftests: Add basic selftests for hugetlbfs-backed guest_mem > KVM: selftests: Support various types of backing sources for private > memory > KVM: selftests: Update test for various private memory backing source > types > > fs/hugetlbfs/inode.c | 102 ++-- > include/linux/hugetlb.h | 86 ++- > include/linux/mm.h | 1 + > include/uapi/linux/kvm.h | 25 + > mm/hugetlb.c | 324 +++++++----- > mm/truncate.c | 24 +- > .../testing/selftests/kvm/guest_memfd_test.c | 33 +- > .../testing/selftests/kvm/include/test_util.h | 14 + > tools/testing/selftests/kvm/lib/test_util.c | 74 +++ > .../kvm/x86_64/private_mem_conversions_test.c | 38 +- > virt/kvm/guest_mem.c | 488 ++++++++++++++---- > 11 files changed, 882 insertions(+), 327 deletions(-) > > -- > 2.41.0.rc0.172.g3f132b7071-goog -- Isaku Yamahata <isaku.yamahata@xxxxxxxxx>