RE: [PATCH] udmabuf: revert 'Add support for mapping hugepages (v4)'

"Kasireddy, Vivek" <vivek.kasireddy@xxxxxxxxx> · Wed, 14 Jun 2023 07:51:50 +0000

Hi David,

> 
> On 13.06.23 10:26, Kasireddy, Vivek wrote:
> > Hi David,
> >
> >>
> >> On 12.06.23 09:10, Kasireddy, Vivek wrote:
> >>> Hi Mike,
> >>
> >> Hi Vivek,
> >>
> >>>
> >>> Sorry for the late reply; I just got back from vacation.
> >>> If it is unsafe to directly use the subpages of a hugetlb page, then
> reverting
> >>> this patch seems like the only option for addressing this issue
> immediately.
> >>> So, this patch is
> >>> Acked-by: Vivek Kasireddy <vivek.kasireddy@xxxxxxxxx>
> >>>
> >>> As far as the use-case is concerned, there are two main users of the
> >> udmabuf
> >>> driver: Qemu and CrosVM VMMs. However, it appears Qemu is the only
> >> one
> >>> that uses hugetlb pages (when hugetlb=on is set) as the backing store for
> >>> Guest (Linux, Android and Windows) system memory. The main goal is
> to
> >>> share the pages associated with the Guest allocated framebuffer (FB)
> with
> >>> the Host GPU driver and other components in a zero-copy way. To that
> >> end,
> >>> the guest GPU driver (virtio-gpu) allocates 4k size pages (associated with
> >>> the FB) and pins them before sharing the (guest) physical (or dma)
> >> addresses
> >>> (and lengths) with Qemu. Qemu then translates the addresses into file
> >>> offsets and shares these offsets with udmabuf.
> >>
> >> Is my understanding correct, that we can effectively long-term pin
> >> (worse than mlock) 64 MiB per UDMABUF_CREATE, allowing eventually
> !root
> > The 64 MiB limit is the theoretical upper bound that we have not seen hit in
> > practice. Typically, for a 1920x1080 resolution (commonly used in Guests),
> > the size of the FB is ~8 MB (1920x1080x4). And, most modern Graphics
> > compositors flip between two FBs.
> >
> 
> Okay, but users with privileges to open that file can just create as
> many as they want? I think I'll have to play with it.
Yeah, unfortunately, we are not restricting the total number of FBs or other
buffers that are mapped by udambuf per user. We'll definitely try to add a
patch to align it with mlock limits.

> 
> >> users
> >>
> >> ll /dev/udmabuf
> >> crw-rw---- 1 root kvm 10, 125 12. Jun 08:12 /dev/udmabuf
> >>
> >> to bypass there effective MEMLOCK limit, fragmenting physical memory
> and
> >> breaking swap?
> > Right, it does not look like the mlock limits are honored.
> >
> 
> That should be added.
> 
> >>
> >> Regarding the udmabuf_vm_fault(), I assume we're mapping pages we
> >> obtained from the memfd ourselves into a special VMA (mmap() of the
> > mmap operation is really needed only if any component on the Host needs
> > CPU access to the buffer. But in most scenarios, we try to ensure direct GPU
> > access (h/w acceleration via gl) to these pages.
> >
> >> udmabuf). I'm not sure how well shmem pages are prepared for getting
> >> mapped by someone else into an arbitrary VMA (page->index?).
> > Most drm/gpu drivers use shmem pages as the backing store for FBs and
> > other buffers and also provide mmap capability. What concerns do you see
> > with this approach?
> 
> Are these mmaping the pages the way udmabuf maps these pages (IOW,
> on-demand fault where we core-mm will adjust the mapcount etc)?
> 
> Skimming over at shmem_read_mapping_page() users, I assume most of
> them
> use a VM_PFNMAP mapping (or don't mmap them at all), where we won't be
> messing with the struct page at all.
> 
> (That might even allow you to mmap hugetlb sub-pages, because the struct
> page -- and mapcount -- will be ignored completely and not touched.)
Oh, are you suggesting that if we do vma->vm_flags |= VM_PFNMAP
in the mmap handler (mmap_udmabuf) and also do
vmf_insert_pfn(vma, vmf->address, page_to_pfn(page))
instead of
vmf->page = ubuf->pages[pgoff];
get_page(vmf->page);

in the vma fault handler (udmabuf_vm_fault), we can avoid most of the
pitfalls you have identified -- including with the usage of hugetlb subpages? 

> 
> >
> >>
> >> ... also, just imagine someone doing FALLOC_FL_PUNCH_HOLE /
> ftruncate()
> >> on the memfd. What's mapped into the memfd no longer corresponds to
> >> what's pinned / mapped into the VMA.
> > IIUC, making use of the DMA_BUF_IOCTL_SYNC ioctl would help with any
> > coherency issues:
> > https://www.kernel.org/doc/html/v6.2/driver-api/dma-
> buf.html#c.dma_buf_sync
> >
> 
> Would it as of now? udmabuf_create() pulls the shmem pages out of the
> memfd, not sure how DMA_BUF_IOCTL_SYNC would help to update that
> whenever the pages inside the memfd would change (e.g.,
> FALLOC_FL_PUNCH_HOLE + realloc).
AFAIU, Qemu owns the memfd and if it were to somehow punch a hole in 
its memfd, it is not clear to me if that would affect the Guest pinned FB
pages as well. Regardless, what do you suggest is the right thing to do in
this case on the udmabuf side?

Thanks,
Vivek

> 
> But that's most probably simply "not supported".
> 
> >>
> >>
> >> Was linux-mm (and especially shmem maintainers, ccing Hugh) involved
> in
> >> the upstreaming of udmabuf?
> > It does not appear so from the link below although other key lists were
> cc'd:
> > https://patchwork.freedesktop.org/patch/246100/?series=39879&rev=7
> 
> That's unfortunate :(
> 
> --
> Cheers,
> 
> David / dhildenb