Re: [PATCH RFC] mm/userfaultfd: enable writenotify while userfaultfd-wp is enabled for a VMA

David Hildenbrand <david@xxxxxxxxxx> · Wed, 7 Dec 2022 21:10:41 +0100

For example, libvhost-user.c in QEMU uses for ordinary postcopy:

         /*
          * In postcopy we're using PROT_NONE here to catch anyone
          * accessing it before we userfault.
          */
         mmap_addr = mmap(0, dev_region->size + dev_region->mmap_offset,
                          PROT_NONE, MAP_SHARED | MAP_NORESERVE,
                          vmsg->fds[0], 0);

I assume this is for missing mode only.  More on wr-protect mode below.

Personally I don't see immediately on whether this is needed.  If the
process itself is trusted then it should be under control of anyone who
will be accessing the pages..  If the other threads are not trusted, then
there's no way to stop anyone from mprotect(RW) after mprotect(NONE)
anyway..

I think there is a difference between code that can read/write memory 
(e.g., rings/buffers in libvhost-user.c, where I think this was added to 
detect such early access) and code that can execute arbitrary mprotect() 
to voluntarily break the system. I think that's the whole reason 
libvhost-user.c went that direction.

So I may not really get the gut of it.

Another way to make sure no one access it is right after receiving the
memory range from QEMU (VhostUserMemoryRegion), if VuDev.postcopy_listening
is set, then we register the range with UFFD missing immediately.  After
all if postcopy_listening is set it means we passed the advise phase
already (VHOST_USER_POSTCOPY_ADVISE). Any potential access will be blocked
until QEMU starts to read on that uffd.

I'd imagine, when using uffd-wp (VM snapshotting with shmem?) one might use
PROT_READ instead before the write-protection is properly set. Because read
access would be fine in the meantime.

It'll be different for wr-protect IIUC, because unlike missing protections,
we don't worry about writes happening before UFFDIO_WRITEPROTECT.

IMHO the solo thing the vhost-user proc needs to do is one
UFFDIO_WRITEPROTECT for each of the range when QEMU tells it to, then it'll
be fine.  Pre-writes are fine.

Sorry I probably went a bit off-topic.  I just want to make sure I don't
miss any real use case of having mprotect being useful for uffd-wp being
there, because that used to be a grey area for me.

But I'm just pulling use cases out of my magic hat ;) Nothing stops user
space from doing things that are not clearly forbidden (well, even then
users might complain, but that's a different story).

Yes, I think those are always fine but the user just cannot assume it'll
work as they assumed how it will work.

If "doing things that are not clearly forbidden" triggers a host warning or
crash that's a bug, OTOH if the outcome is limited to the process itself
then from kernel pov I think we're good.  I used to even thought about
forbid mprotect() on uffd-wp but I'm not sure whether it's good idea either.

Let's see whether I missed something above, if so I'll rethink.

Let's not get distracted too much. As a reminder, I wrote that test case 
to showcase that other kernel code behaves just like the migration code 
does. It was the long hanging fruit to make a point, I'm happy to 
exclude it for now.

Now, my 2 cents on the whole topic regarding "supported", "not 
supported" etc:

(1) If something is not supported we should bail out or at least warn
    the user. I'm pretty sure there are other uffd-wp dummy users like
    me. Skimming over the man userfaultfd page nothing in particular
    regarding PROT_WRITE, mprotect(), ... maybe I looked at the wrong
    page.
(2) If something is easy to support, support it instead of having all
    these surprises for users and having to document them and warn the
    user. Makes all these discussions regarding what's supported, what's
    a valid use case, etc ... much easier.
(3) Inconsistency confuses users. If something used to work for anon,
    in an ideal world, we make shmem behave in a similar, non-surprising
    way.
(4) There are much smarter people like me with much more advanced
    magical hats. I'm pretty sure they will come up with use cases that
    I am not even able to anticipate right now.
(5) Users will make any imaginable mistake possible and point at the
    doc, that nothing speaks against it and that the system didn't bail
    out.

Again, just my 2 cents. Maybe the dos and don'ts of userfaultfd-wp are 
properly documented already and we just don't bail out.

--
Thanks,

David / dhildenb