Re: [PATCH v1 0/2] udmabuf: Add back support for mapping hugetlb pages

David Hildenbrand <david@xxxxxxxxxx> · Tue, 27 Jun 2023 09:10:32 +0200

On 27.06.23 08:37, Kasireddy, Vivek wrote:
Hi David,

Hi!

sorry for taking a bit longer to reply lately.

[...]

Sounds right, maybe it needs to go back to the old GUP solution, though, as
mmu notifiers are also mm-based not fd-based. Or to be explicit, I think
it'll be pin_user_pages(FOLL_LONGTERM) with the new API.  It'll also solve
the movable pages issue on pinning.

It better should be pin_user_pages(FOLL_LONGTERM). But I'm afraid we
cannot achieve that without breaking the existing kernel interface ...
Yeah, as you suggest, we unfortunately cannot go back to using GUP
without breaking udmabuf_create UAPI that expects memfds and file
offsets.

So we might have to implement the same page migration as gup does on
FOLL_LONGTERM here ... maybe there are more such cases/drivers that
actually require that handling when simply taking pages out of the
memfd, believing they can hold on to them forever.
IIUC, I don't think just handling the page migration in udmabuf is going to
cut it. It might require active cooperation of the Guest GPU driver as well
if this is even feasible.

The idea is, that once you extract the page from the memfd and it 
resides somewhere bad (MIGRATE_CMA, ZONE_MOVABLE), you trigger page 
migration. Essentially what migrate_longterm_unpinnable_pages() does:

Why would the guest driver have to be involved? It shouldn't care about
page migration in the hypervisor.

[...]

balloon, and then using that memory for communicating with the device]

Maybe it's all fine with udmabuf because of the way it is setup/torn
down by the guest driver. Unfortunately I can't tell.
Here are the functions used by virtio-gpu (Guest GPU driver) to allocate
pages for its resources:
__drm_gem_shmem_create: https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/drm_gem_shmem_helper.c#L97
Interestingly, the comment in the above function says that the pages
should not be allocated from the MOVABLE zone.

It doesn't add GFP_MOVABLE, so pages don't end up in 
ZONE_MOVABLE/MIGRATE_CMA *in the guest*. But we care about the 
ZONE_MOVABLE /MIGRATE_CMA *in the host*. (what the guest does is right, 
though)

IOW, what udmabuf does with guest memory on the hypervisor side, not the 
guest driver on the guest side.

The pages along with their dma addresses are then extracted and shared
with Qemu using these two functions:
drm_gem_get_pages: https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/drm_gem.c#L534
virtio_gpu_object_shmem_init: https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/virtio/virtgpu_object.c#L135

^ so these two target the guest driver as well, right? IOW, there is a 
memfd (shmem) in the guest that the guest driver uses to allocate pages 
from and there is the memfd in the hypervisor to back guest RAM.

The latter gets registered with udmabuf.

Qemu then translates the dma addresses into file offsets and creates
udmabufs -- as an optimization to avoid data copies only if blob is set
to true.

If the guest OS doesn't end up freeing/reallocating that memory while 
it's registered with udmabuf in the hypervisor, then we should be fine.

Because that way, the guest won't end up trigger MADV_DONTNEED by 
"accident".

--
Cheers,

David / dhildenb