On 2/12/2025 4:09 PM, David Hildenbrand wrote: > On 10.02.25 07:32, Shivank Garg wrote: >> Previously, guest-memfd allocations were following local NUMA node id >> in absence of process mempolicy, resulting in random memory allocation. >> Moreover, mbind() couldn't be used since memory wasn't mapped to userspace >> in VMM. >> >> Enable NUMA policy support by implementing vm_ops for guest-memfd mmap >> operation. This allows VMM to map the memory and use mbind() to set the >> desired NUMA policy. The policy is then retrieved via >> mpol_shared_policy_lookup() and passed to filemap_grab_folio_mpol() to >> ensure that allocations follow the specified memory policy. >> >> This enables VMM to control guest memory NUMA placement by calling mbind() >> on the mapped memory regions, providing fine-grained control over guest >> memory allocation across NUMA nodes. > > Yes, I think that is the right direction, especially with upcoming in-place conversion of shared<->private in mind. > >> >> Suggested-by: David Hildenbrand <david@xxxxxxxxxx> >> Signed-off-by: Shivank Garg <shivankg@xxxxxxx> >> --- >> virt/kvm/guest_memfd.c | 84 +++++++++++++++++++++++++++++++++++++++--- >> 1 file changed, 78 insertions(+), 6 deletions(-) >> >> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c >> index b2aa6bf24d3a..e1ea8cb292fa 100644 >> --- a/virt/kvm/guest_memfd.c >> +++ b/virt/kvm/guest_memfd.c >> @@ -2,6 +2,7 @@ >> #include <linux/backing-dev.h> >> #include <linux/falloc.h> >> #include <linux/kvm_host.h> >> +#include <linux/mempolicy.h> >> #include <linux/pagemap.h> >> #include <linux/anon_inodes.h> >> @@ -11,8 +12,13 @@ struct kvm_gmem { >> struct kvm *kvm; >> struct xarray bindings; >> struct list_head entry; >> + struct shared_policy policy; >> }; >> +static struct mempolicy *kvm_gmem_get_pgoff_policy(struct kvm_gmem *gmem, >> + pgoff_t index, >> + pgoff_t *ilx); >> + >> /** >> * folio_file_pfn - like folio_file_page, but return a pfn. >> * @folio: The folio which contains this index. >> @@ -96,10 +102,20 @@ static int kvm_gmem_prepare_folio(struct kvm *kvm, struct kvm_memory_slot *slot, >> * Ignore accessed, referenced, and dirty flags. The memory is >> * unevictable and there is no storage to write back to. >> */ >> -static struct folio *kvm_gmem_get_folio(struct inode *inode, pgoff_t index) >> +static struct folio *kvm_gmem_get_folio(struct file *file, pgoff_t index) > > I'd probably do that change in a separate prep-patch; would remove some of the unrelated noise in this patch. Yes, I'll separate it. > >> { >> /* TODO: Support huge pages. */ >> - return filemap_grab_folio(inode->i_mapping, index); >> + struct folio *folio = NULL; > > No need to init folio. > >> + struct inode *inode = file_inode(file); >> + struct kvm_gmem *gmem = file->private_data; > > Prefer reverse christmas-tree (longest line first) as possible. > >> + struct mempolicy *policy; >> + pgoff_t ilx; > > Why do you return the ilx from kvm_gmem_get_pgoff_policy() if it is completely unused? > >> + >> + policy = kvm_gmem_get_pgoff_policy(gmem, index, &ilx); >> + folio = filemap_grab_folio_mpol(inode->i_mapping, index, policy); >> + mpol_cond_put(policy); > I'll remove the kvm_gmem_get_pgoff_policy. > The downside is that we always have to lookup the policy, even if we don't have to allocate anything because the pagecache already contains a folio. > > Would there be a way to lookup if there is something already allcoated (fast-path) and fallback to the slow-path (lookup policy+call filemap_grab_folio_mpol) only if that failed? > > Note that shmem.c does exactly that: shmem_alloc_folio() is only called after filemap_get_entry() told us that there is nothing. > Yes, It's doable. A filemap_get_folio() for fast-path: If it does not return folio, then falling back to current slowpath. >> + >> + return folio; ... >> +} >> +#endif /* CONFIG_NUMA */ >> static struct file_operations kvm_gmem_fops = { >> +#ifdef CONFIG_NUMA >> + .mmap = kvm_gmem_mmap, >> +#endif > > With Fuad's work, this will be unconditional, and you'd only set the kvm_gmem_vm_ops conditionally -- just like shmem.c. Maybe best to prepare for that already: allow unconditional mmap (Fuad will implement the faulting logic of shared pages, until then all accesses would SIGBUS I assume, did you try that?) and only mess with get_policy/set_policy. Yes, I'll change according to it. I have to try that out. Thanks, Shivank