Re: [RFC v1 1/3] mm/mmu_notifier: Add a new notifier for mapping updates (new pages)

Peter Xu <peterx@xxxxxxxxxx> · Thu, 27 Jul 2023 17:43:13 -0400

Hi, Vivek,

On Tue, Jul 25, 2023 at 10:24:21PM +0000, Kasireddy, Vivek wrote:
> Hi Hugh,
> 
> > 
> > On Mon, 24 Jul 2023, Kasireddy, Vivek wrote:
> > > Hi Jason,
> > > > On Mon, Jul 24, 2023 at 07:54:38AM +0000, Kasireddy, Vivek wrote:
> > > >
> > > > > > I'm not at all familiar with the udmabuf use case but that sounds
> > > > > > brittle and effectively makes this notifier udmabuf specific right?
> > > > > Oh, Qemu uses the udmabuf driver to provide Host Graphics
> > components
> > > > > (such as Spice, Gstreamer, UI, etc) zero-copy access to Guest created
> > > > > buffers. In other words, from a core mm standpoint, udmabuf just
> > > > > collects a bunch of pages (associated with buffers) scattered inside
> > > > > the memfd (Guest ram backed by shmem or hugetlbfs) and wraps
> > > > > them in a dmabuf fd. And, since we provide zero-copy access, we
> > > > > use DMA fences to ensure that the components on the Host and
> > > > > Guest do not access the buffer simultaneously.
> > > >
> > > > So why do you need to track updates proactively like this?
> > > As David noted in the earlier series, if Qemu punches a hole in its memfd
> > > that goes through pages that are registered against a udmabuf fd, then
> > > udmabuf needs to update its list with new pages when the hole gets
> > > filled after (guest) writes. Otherwise, we'd run into the coherency
> > > problem (between udmabuf and memfd) as demonstrated in the selftest
> > > (patch #3 in this series).
> > 
> > Wouldn't this all be very much better if Qemu stopped punching holes there?
> I think holes can be punched anywhere in the memfd for various reasons. Some

I just start to read this thread, even haven't finished all of them.. but
so far I'm not sure whether this is right at all..

udmabuf is a file, it means it should follow the file semantics. mmu
notifier is per-mm, otoh.

Imagine for some reason QEMU mapped the guest pages twice, udmabuf is
created with vma1, so udmabuf registers the mm changes over vma1 only.

However the shmem/hugetlb page cache can be populated in either vma1, or
vma2.  It means when populating on vma2 udmabuf won't get update notify at
all, udmabuf pages can still be obsolete.  Same thing to when multi-process
QEMU is used, where we can have vma1 in QEMU while vma2 in the other
process like vhost-user.

I think the trick here is we tried to "hide" the fact that these are
actually normal file pages, but we're doing PFNMAP on them... then we want
the file features back, like hole punching..

If we used normal file operations, everything will just work fine; TRUNCATE
will unmap the host mapped frame buffers when needed, and when accessed
it'll fault on demand from the page cache.  We seem to be trying to
reinvent "truncation" for pfnmap but mmu notifier doesn't sound right to
this at least..

> of the use-cases where this would be done were identified by David. Here is what
> he said in an earlier discussion:
> "There are *probably* more issues on the QEMU side when udmabuf is paired 
> with things like MADV_DONTNEED/FALLOC_FL_PUNCH_HOLE used for 
> virtio-balloon, virtio-mem, postcopy live migration, ... for example, in"

Now after seething this, I'm truly wondering whether we can still simply
use the file semantics we already have (for either shmem/hugetlb/...), or
is it a must we need to use a single fd to represent all?

Say, can we just use a tuple (fd, page_array) rather than the udmabuf
itself to do host zero-copy mapping?  the page_array can be e.g. a list of
file offsets that points to the pages (rather than pinning the pages using
FOLL_GET).  The good thing is then the fd can be the guest memory file
itself.  With that, we can mmap() over the shmem/hugetlb in whatever vma
and whatever process.  Truncation (and actually everything... e.g. page
migration, swapping, ... which will be disabled if we use PFNMAP pins) will
just all start to work, afaiu.

Thanks,

-- 
Peter Xu