On Tue, Mar 04, 2025, David Hildenbrand wrote: > On 04.03.25 16:30, Sean Christopherson wrote: > > On Tue, Mar 04, 2025, Ackerley Tng wrote: > > > Vlastimil Babka <vbabka@xxxxxxx> writes: > > > > > struct shared_policy should be stored on the inode rather than the file, > > > > > since the memory policy is a property of the memory (struct inode), > > > > > rather than a property of how the memory is used for a given VM (struct > > > > > file). > > > > > > > > That makes sense. AFAICS shmem also uses inodes to store policy. > > > > > > > > > When the shared_policy is stored on the inode, intra-host migration [1] > > > > > will work correctly, since the while the inode will be transferred from > > > > > one VM (struct kvm) to another, the file (a VM's view/bindings of the > > > > > memory) will be recreated for the new VM. > > > > > > > > > > I'm thinking of having a patch like this [2] to introduce inodes. > > > > > > > > shmem has it easier by already having inodes > > > > > > > > > With this, we shouldn't need to pass file pointers instead of inode > > > > > pointers. > > > > > > > > Any downsides, besides more work needed? Or is it feasible to do it using > > > > files now and convert to inodes later? > > > > > > > > Feels like something that must have been discussed already, but I don't > > > > recall specifics. > > > > > > Here's where Sean described file vs inode: "The inode is effectively the > > > raw underlying physical storage, while the file is the VM's view of that > > > storage." [1]. > > > > > > I guess you're right that for now there is little distinction between > > > file and inode and using file should be feasible, but I feel that this > > > dilutes the original intent. > > > > Hmm, and using the file would be actively problematic at some point. One could > > argue that NUMA policy is property of the VM accessing the memory, i.e. that two > > VMs mapping the same guest_memfd could want different policies. But in practice, > > that would allow for conflicting requirements, e.g. different policies in each > > VM for the same chunk of memory, and would likely lead to surprising behavior due > > to having to manually do mbind() for every VM/file view. > > I think that's the same behavior with shmem? I mean, if you have two people > asking for different things for the same MAP_SHARE file range, surprises are > unavoidable. Yeah, I was specifically thinking of the case where a secondary mapping doesn't do mbind() at all, e.g. could end up effectively polluting guest_memfd with "bad" allocations.