Re: [RFC PATCH 00/19] hugetlb support for KVM guest_mem

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jun 06, 2023 at 07:03:45PM +0000,
Ackerley Tng <ackerleytng@xxxxxxxxxx> wrote:

> Hello,
> 
> This patchset builds upon a soon-to-be-published WIP patchset that Sean
> published at https://github.com/sean-jc/linux/tree/x86/kvm_gmem_solo, mentioned
> at [1].
> 
> The tree can be found at:
> https://github.com/googleprodkernel/linux-cc/tree/gmem-hugetlb-rfc-v1
> 
> In this patchset, hugetlb support for KVM's guest_mem (aka gmem) is introduced,
> allowing VM private memory (for confidential computing) to be backed by hugetlb
> pages.
> 
> guest_mem provides userspace with a handle, with which userspace can allocate
> and deallocate memory for confidential VMs without mapping the memory into
> userspace.
> 
> Why use hugetlb instead of introducing a new allocator, like gmem does for 4K
> and transparent hugepages?
> 
> + hugetlb provides the following useful functionality, which would otherwise
>   have to be reimplemented:
>     + Allocation of hugetlb pages at boot time, including
>         + Parsing of kernel boot parameters to configure hugetlb
>         + Tracking of usage in hstate
>         + gmem will share the same system-wide pool of hugetlb pages, so users
>           don't have to have separate pools for hugetlb and gmem
>     + Page accounting with subpools
>         + hugetlb pages are tracked in subpools, which gmem uses to reserve
>           pages from the global hstate
>     + Memory charging
>         + hugetlb provides code that charges memory to cgroups
>     + Reporting: hugetlb usage and availability are available at /proc/meminfo,
>       etc
> 
> The first 11 patches in this patchset is a series of refactoring to decouple
> hugetlb and hugetlbfs.
> 
> The central thread binding the refactoring is that some functions (like
> inode_resv_map(), inode_subpool(), inode_hstate(), etc) rely on a hugetlbfs
> concept, that the resv_map, subpool, hstate, are in a specific field in a
> hugetlb inode.
> 
> Refactoring to parametrize functions by hstate, subpool, resv_map will allow
> hugetlb to be used by gmem and in other places where these data structures
> aren't necessarily stored in the same positions in the inode.
> 
> The refactoring proposed here is just the minimum required to get a
> proof-of-concept working with gmem. I would like to get opinions on this
> approach before doing further refactoring. (See TODOs)
> 
> TODOs:
> 
> + hugetlb/hugetlbfs refactoring
>     + remove_inode_hugepages() no longer needs to be exposed, it is hugetlbfs
>       specific and used only in inode.c
>     + remove_mapping_hugepages(), remove_inode_single_folio(),
>       hugetlb_unreserve_pages() shouldn't need to take inode as a parameter
>         + Updating inode->i_blocks can be refactored to a separate function and
>           called from hugetlbfs and gmem
>     + alloc_hugetlb_folio_from_subpool() shouldn't need to be parametrized by
>       vma
>     + hugetlb_reserve_pages() should be refactored to be symmetric with
>       hugetlb_unreserve_pages()
>         + It should be parametrized by resv_map
>         + alloc_hugetlb_folio_from_subpool() could perhaps use
>           hugetlb_reserve_pages()?
> + gmem
>     + Figure out if resv_map should be used by gmem at all
>         + Probably needs more refactoring to decouple resv_map from hugetlb
>           functions

Hi. If kvm gmem is compiled as kernel module, many symbols are failed to link.
You need to add EXPORT_SYMBOL{,_GPL} for exported symbols.
Or compile it to kernel instead of module?

Thanks,

> Questions for the community:
> 
> 1. In this patchset, every gmem file backed with hugetlb is given a new
>    subpool. Is that desirable?
>     + In hugetlbfs, a subpool always belongs to a mount, and hugetlbfs has one
>       mount per hugetlb size (2M, 1G, etc)
>     + memfd_create(MFD_HUGETLB) effectively returns a full hugetlbfs file, so it
>       (rightfully) uses the hugetlbfs kernel mounts and their subpools
>     + I gave each file a subpool mostly to speed up implementation and still be
>       able to reserve hugetlb pages from the global hstate based on the gmem
>       file size.
>     + gmem, unlike hugetlbfs, isn't meant to be a full filesystem, so
>         + Should there be multiple mounts, one for each hugetlb size?
>         + Will the mounts be initialized on boot or on first gmem file creation?
>         + Or is one subpool per gmem file fine?
> 2. Should resv_map be used for gmem at all, since gmem doesn't allow userspace
>    reservations?
> 
> [1] https://lore.kernel.org/lkml/ZEM5Zq8oo+xnApW9@xxxxxxxxxx/
> 
> ---
> 
> Ackerley Tng (19):
>   mm: hugetlb: Expose get_hstate_idx()
>   mm: hugetlb: Move and expose hugetlbfs_zero_partial_page
>   mm: hugetlb: Expose remove_inode_hugepages
>   mm: hugetlb: Decouple hstate, subpool from inode
>   mm: hugetlb: Allow alloc_hugetlb_folio() to be parametrized by subpool
>     and hstate
>   mm: hugetlb: Provide hugetlb_filemap_add_folio()
>   mm: hugetlb: Refactor vma_*_reservation functions
>   mm: hugetlb: Refactor restore_reserve_on_error
>   mm: hugetlb: Use restore_reserve_on_error directly in filesystems
>   mm: hugetlb: Parametrize alloc_hugetlb_folio_from_subpool() by
>     resv_map
>   mm: hugetlb: Parametrize hugetlb functions by resv_map
>   mm: truncate: Expose preparation steps for truncate_inode_pages_final
>   KVM: guest_mem: Refactor kvm_gmem fd creation to be in layers
>   KVM: guest_mem: Refactor cleanup to separate inode and file cleanup
>   KVM: guest_mem: hugetlb: initialization and cleanup
>   KVM: guest_mem: hugetlb: allocate and truncate from hugetlb
>   KVM: selftests: Add basic selftests for hugetlbfs-backed guest_mem
>   KVM: selftests: Support various types of backing sources for private
>     memory
>   KVM: selftests: Update test for various private memory backing source
>     types
> 
>  fs/hugetlbfs/inode.c                          | 102 ++--
>  include/linux/hugetlb.h                       |  86 ++-
>  include/linux/mm.h                            |   1 +
>  include/uapi/linux/kvm.h                      |  25 +
>  mm/hugetlb.c                                  | 324 +++++++-----
>  mm/truncate.c                                 |  24 +-
>  .../testing/selftests/kvm/guest_memfd_test.c  |  33 +-
>  .../testing/selftests/kvm/include/test_util.h |  14 +
>  tools/testing/selftests/kvm/lib/test_util.c   |  74 +++
>  .../kvm/x86_64/private_mem_conversions_test.c |  38 +-
>  virt/kvm/guest_mem.c                          | 488 ++++++++++++++----
>  11 files changed, 882 insertions(+), 327 deletions(-)
> 
> --
> 2.41.0.rc0.172.g3f132b7071-goog

-- 
Isaku Yamahata <isaku.yamahata@xxxxxxxxx>



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux