Hi Sean,
Thanks for implementing this POC!
I’ve started porting the selftests (both Chao’s and those I added [1]).
guest mem seems to cover the use cases that have been discussed and
proposed so far, but I still need to figure out how gmem can work with
+ hugetlbfs
+ specification of/storing memory policy (for NUMA node bindings)
+ memory accounting - we may need to account for memory used separately,
so that guest mem shows up separately on /proc/meminfo and similar
places.
One issue I’ve found so far is that the pointer to kvm (gmem->kvm) is
not cleaned up, and hence it is possible to crash the host kernel in the
following way
1. Create a KVM VM
2. Create a guest mem fd on that VM
3. Create a memslot with the guest mem fd (hence binding the fd to the
VM)
4. Close/destroy the KVM VM
5. Call fallocate(PUNCH_HOLE) on the guest mem fd, which uses gmem->kvm
when it tries to do invalidation.
I then tried to clean up the gmem->kvm pointer during unbinding when the
KVM VM is destroyed.
That works, but then I realized there’s a simpler way to use the pointer
after freeing:
1. Create a KVM VM
2. Create a guest mem fd on that VM
3. Close/destroy the KVM VM
4. Call fallocate(PUNCH_HOLE) on the guest mem fd, which uses gmem->kvm
when it tries to do invalidation.
Perhaps binding should mean setting the gmem->kvm pointer in addition to
gmem->bindings. This makes binding and unbinding symmetric and avoids
the use-after-frees described above.
This also means that creating a guest mem fd is no longer dependent on
the VM. Perhaps we can make creating a gmem fd a system ioctl (like
KVM_GET_API_VERSION and KVM_CREATE_VM) instead of a vm ioctl?
[1]
https://lore.kernel.org/all/cover.1678926164.git.ackerleytng@xxxxxxxxxx/T/
Ackerley