Vlastimil Babka <vbabka@xxxxxxx> writes: > On 2/28/25 18:25, Ackerley Tng wrote: >> Shivank Garg <shivankg@xxxxxxx> writes: >> >>> Previously, guest-memfd allocations followed local NUMA node id in absence >>> of process mempolicy, resulting in arbitrary memory allocation. >>> Moreover, mbind() couldn't be used since memory wasn't mapped to userspace >>> in the VMM. >>> >>> Enable NUMA policy support by implementing vm_ops for guest-memfd mmap >>> operation. This allows the VMM to map the memory and use mbind() to set >>> the desired NUMA policy. The policy is then retrieved via >>> mpol_shared_policy_lookup() and passed to filemap_grab_folio_mpol() to >>> ensure that allocations follow the specified memory policy. >>> >>> This enables the VMM to control guest memory NUMA placement by calling >>> mbind() on the mapped memory regions, providing fine-grained control over >>> guest memory allocation across NUMA nodes. >>> >>> The policy change only affect future allocations and does not migrate >>> existing memory. This matches mbind(2)'s default behavior which affects >>> only new allocations unless overridden with MPOL_MF_MOVE/MPOL_MF_MOVE_ALL >>> flags, which are not supported for guest_memfd as it is unmovable. >>> >>> Suggested-by: David Hildenbrand <david@xxxxxxxxxx> >>> Signed-off-by: Shivank Garg <shivankg@xxxxxxx> >>> --- >>> virt/kvm/guest_memfd.c | 76 +++++++++++++++++++++++++++++++++++++++++- >>> 1 file changed, 75 insertions(+), 1 deletion(-) >>> >>> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c >>> index f18176976ae3..b3a8819117a0 100644 >>> --- a/virt/kvm/guest_memfd.c >>> +++ b/virt/kvm/guest_memfd.c >>> @@ -2,6 +2,7 @@ >>> #include <linux/backing-dev.h> >>> #include <linux/falloc.h> >>> #include <linux/kvm_host.h> >>> +#include <linux/mempolicy.h> >>> #include <linux/pagemap.h> >>> #include <linux/anon_inodes.h> >>> >>> @@ -11,8 +12,12 @@ struct kvm_gmem { >>> struct kvm *kvm; >>> struct xarray bindings; >>> struct list_head entry; >>> + struct shared_policy policy; >>> }; >>> >> >> struct shared_policy should be stored on the inode rather than the file, >> since the memory policy is a property of the memory (struct inode), >> rather than a property of how the memory is used for a given VM (struct >> file). > > That makes sense. AFAICS shmem also uses inodes to store policy. > >> When the shared_policy is stored on the inode, intra-host migration [1] >> will work correctly, since the while the inode will be transferred from >> one VM (struct kvm) to another, the file (a VM's view/bindings of the >> memory) will be recreated for the new VM. >> >> I'm thinking of having a patch like this [2] to introduce inodes. > > shmem has it easier by already having inodes > >> With this, we shouldn't need to pass file pointers instead of inode >> pointers. > > Any downsides, besides more work needed? Or is it feasible to do it using > files now and convert to inodes later? > > Feels like something that must have been discussed already, but I don't > recall specifics. Here's where Sean described file vs inode: "The inode is effectively the raw underlying physical storage, while the file is the VM's view of that storage." [1]. I guess you're right that for now there is little distinction between file and inode and using file should be feasible, but I feel that this dilutes the original intent. Something like [2] doesn't seem like too big of a change and could perhaps be included earlier rather than later, since it will also contribute to support for restricted mapping [3]. [1] https://lore.kernel.org/all/ZLGiEfJZTyl7M8mS@xxxxxxxxxx/ [2] https://lore.kernel.org/all/d1940d466fc69472c8b6dda95df2e0522b2d8744.1726009989.git.ackerleytng@xxxxxxxxxx/ [3] https://lore.kernel.org/all/20250117163001.2326672-1-tabba@xxxxxxxxxx/T/