Hi David, > > On 27.06.23 08:37, Kasireddy, Vivek wrote: > > Hi David, > > > > Hi! > > sorry for taking a bit longer to reply lately. No problem. > > [...] > > >>> Sounds right, maybe it needs to go back to the old GUP solution, though, > as > >>> mmu notifiers are also mm-based not fd-based. Or to be explicit, I think > >>> it'll be pin_user_pages(FOLL_LONGTERM) with the new API. It'll also > solve > >>> the movable pages issue on pinning. > >> > >> It better should be pin_user_pages(FOLL_LONGTERM). But I'm afraid we > >> cannot achieve that without breaking the existing kernel interface ... > > Yeah, as you suggest, we unfortunately cannot go back to using GUP > > without breaking udmabuf_create UAPI that expects memfds and file > > offsets. > > > >> > >> So we might have to implement the same page migration as gup does on > >> FOLL_LONGTERM here ... maybe there are more such cases/drivers that > >> actually require that handling when simply taking pages out of the > >> memfd, believing they can hold on to them forever. > > IIUC, I don't think just handling the page migration in udmabuf is going to > > cut it. It might require active cooperation of the Guest GPU driver as well > > if this is even feasible. > > The idea is, that once you extract the page from the memfd and it > resides somewhere bad (MIGRATE_CMA, ZONE_MOVABLE), you trigger page > migration. Essentially what migrate_longterm_unpinnable_pages() does: So, IIUC, it looks like calling check_and_migrate_movable_pages() at the time of creation (udmabuf_create) and when we get notified about something like FALLOC_FL_PUNCH_HOLE will be all that needs to be done in udmabuf? > > Why would the guest driver have to be involved? It shouldn't care about > page migration in the hypervisor. Yeah, it appears that the page migration would be transparent to the Guest driver. > > [...] > > >> balloon, and then using that memory for communicating with the device] > >> > >> Maybe it's all fine with udmabuf because of the way it is setup/torn > >> down by the guest driver. Unfortunately I can't tell. > > Here are the functions used by virtio-gpu (Guest GPU driver) to allocate > > pages for its resources: > > __drm_gem_shmem_create: > https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/drm_gem_sh > mem_helper.c#L97 > > Interestingly, the comment in the above function says that the pages > > should not be allocated from the MOVABLE zone. > > It doesn't add GFP_MOVABLE, so pages don't end up in > ZONE_MOVABLE/MIGRATE_CMA *in the guest*. But we care about the > ZONE_MOVABLE /MIGRATE_CMA *in the host*. (what the guest does is > right, > though) > > IOW, what udmabuf does with guest memory on the hypervisor side, not the > guest driver on the guest side. Ok, got it. > > > The pages along with their dma addresses are then extracted and shared > > with Qemu using these two functions: > > drm_gem_get_pages: > https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/drm_gem.c# > L534 > > virtio_gpu_object_shmem_init: > https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/virtio/virtgpu > _object.c#L135 > > ^ so these two target the guest driver as well, right? IOW, there is a > memfd (shmem) in the guest that the guest driver uses to allocate pages > from and there is the memfd in the hypervisor to back guest RAM. > > The latter gets registered with udmabuf. Yes, that's exactly what happens. > > > Qemu then translates the dma addresses into file offsets and creates > > udmabufs -- as an optimization to avoid data copies only if blob is set > > to true. > > If the guest OS doesn't end up freeing/reallocating that memory while > it's registered with udmabuf in the hypervisor, then we should be fine. IIUC, udmabuf does get notified when something like that happens. Thanks, Vivek > > Because that way, the guest won't end up trigger MADV_DONTNEED by > "accident". > > -- > Cheers, > > David / dhildenb