On Sat, Mar 8, 2025 at 5:09 PM Vishal Annapurve <vannapurve@xxxxxxxxxx> wrote: > > On Wed, Feb 26, 2025 at 12:28 AM Shivank Garg <shivankg@xxxxxxx> wrote: > > > > In this patch-series: > > Based on the discussion in the bi-weekly guest_memfd upstream call on > > 2025-02-20[4], I have dropped the RFC tag, documented the memory allocation > > behavior after policy changes and added selftests. > > > > > > KVM's guest-memfd memory backend currently lacks support for NUMA policy > > enforcement, causing guest memory allocations to be distributed arbitrarily > > across host NUMA nodes regardless of the policy specified by the VMM. This > > occurs because conventional userspace NUMA control mechanisms like mbind() > > are ineffective with guest-memfd, as the memory isn't directly mapped to > > userspace when allocations occur. > > > > This patch-series adds NUMA binding capabilities to guest_memfd backend > > KVM guests. It has evolved through several approaches based on community > > feedback: > > > > - v1,v2: Extended the KVM_CREATE_GUEST_MEMFD IOCTL to pass mempolicy. > > - v3: Introduced fbind() syscall for VMM memory-placement configuration. > > - v4-v6: Current approach using shared_policy support and vm_ops (based on > > suggestions from David[1] and guest_memfd biweekly upstream call[2]). > > > > For SEV-SNP guests, which use the guest-memfd memory backend, NUMA-aware > > memory placement is essential for optimal performance, particularly for > > memory-intensive workloads. > > > > This series implements proper NUMA policy support for guest-memfd by: > > > > 1. Adding mempolicy-aware allocation APIs to the filemap layer. > > I have been thinking more about this after the last guest_memfd > upstream call on March 6th. > > To allow 1G page support with guest_memfd [1] without encountering > significant memory overheads, its important to support in-place memory > conversion with private hugepages getting split/merged upon > conversion. Private pages can be seamlessly split/merged only if the > refcounts of complete subpages are frozen, most effective way to > achieve and enforce this is to just not have struct pages for private > memory. All the guest_memfd private range users (including IOMMU [2] > in future) can request pfns for offsets and get notified about > invalidation when pfns go away. > > Not having struct pages for private memory also provide additional benefits: > * Significantly lesser memory overhead for handling splitting/merge operations > - With struct pages around, every split of 1G page needs struct > page allocation for 512 * 512 4K pages in worst case. > * Enable roadmap for PFN range allocators in the backend and usecases > like KHO [3] that target use of memory without struct page. > > IIRC, filemap was initially used as a matter of convenience for > initial guest memfd implementation. > > As pointed by David in the call, to get rid of struct page for private > memory ranges, filemap/pagecache needs to be replaced by a lightweight > mechanism that tracks offsets -> pfns mapping for private memory > ranges while still keeping filemap/pagecache for shared memory ranges > (it's still needed to allow GUP usecases). I am starting to think that Going one step further, If we support folio->mapping and possibly any other needed bits while still tracking folios corresponding to shared memory ranges along with private memory pfns in a separate "gmem_cache" to keep core-mm interaction compatible, can that allow pursuing the direction of not needing filemap at all? > the filemap replacement for private memory ranges should be done > sooner rather than later, otherwise it will become more and more > difficult with features landing in guest_memfd relying on presence of > filemap. > > This discussion matters more for hugepages and PFN range allocations. > I would like to ensure that we have consensus on this direction. > > [1] https://lpc.events/event/18/contributions/1764/ > [2] https://lore.kernel.org/kvm/CAGtprH8C4MQwVTFPBMbFWyW4BrK8-mDqjJn-UUFbFhw4w23f3A@xxxxxxxxxxxxxx/ > [3] https://lore.kernel.org/linux-mm/20240805093245.889357-1-jgowans@xxxxxxxxxx/