On Fri, May 05, 2023 at 07:39:36PM +0000, Ackerley Tng wrote: > > Hi Sean, > > Thanks for implementing this POC! > > I’ve started porting the selftests (both Chao’s and those I added [1]). Hi Sean/Ackerley, Thanks for doing that. Overall making gmem a KVM ioctl() looks good to me and it should also play nice with Intel TDX. Besides what Ackerley mentioned below, I think we haven't discussed device assignment, which will be supported in not too long distance. Current VFIO_IOMMU_MAP_DMA consumes virtual address so that needs to be fixed for fd-based memory anyway, and the fix looks not related to whether this being a syscall() or a KVM ioctl(). There will be some initialization sequence dependency, e.g. if gmem is finally a VM-scope ioctl() then we need VM created first before can we map fd-based memory in VFIO, but that sounds not an issue at all. I also see Vlastimil/David expressed their preference on ioctl. So maybe we can move forward on your current PoC. Do you already have a plan to post a formal version? Chao > > guest mem seems to cover the use cases that have been discussed and > proposed so far, but I still need to figure out how gmem can work with > > + hugetlbfs > + specification of/storing memory policy (for NUMA node bindings) > + memory accounting - we may need to account for memory used separately, > so that guest mem shows up separately on /proc/meminfo and similar > places. > > One issue I’ve found so far is that the pointer to kvm (gmem->kvm) is > not cleaned up, and hence it is possible to crash the host kernel in the > following way > > 1. Create a KVM VM > 2. Create a guest mem fd on that VM > 3. Create a memslot with the guest mem fd (hence binding the fd to the > VM) > 4. Close/destroy the KVM VM > 5. Call fallocate(PUNCH_HOLE) on the guest mem fd, which uses gmem->kvm > when it tries to do invalidation. > > I then tried to clean up the gmem->kvm pointer during unbinding when the > KVM VM is destroyed. > > That works, but then I realized there’s a simpler way to use the pointer > after freeing: > > 1. Create a KVM VM > 2. Create a guest mem fd on that VM > 3. Close/destroy the KVM VM > 4. Call fallocate(PUNCH_HOLE) on the guest mem fd, which uses gmem->kvm > when it tries to do invalidation. > > Perhaps binding should mean setting the gmem->kvm pointer in addition to > gmem->bindings. This makes binding and unbinding symmetric and avoids > the use-after-frees described above. > > This also means that creating a guest mem fd is no longer dependent on > the VM. Perhaps we can make creating a gmem fd a system ioctl (like > KVM_GET_API_VERSION and KVM_CREATE_VM) instead of a vm ioctl? > > [1] > https://lore.kernel.org/all/cover.1678926164.git.ackerleytng@xxxxxxxxxx/T/ > > Ackerley