Re: [EXT] Re: COW in userspace

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 23.08.21 12:16, Ralf Ramsauer wrote:


On 23/08/2021 10:02, David Hildenbrand wrote:
On 20.08.21 15:13, Ralf Ramsauer wrote:
Dear mm folks,

I have an issue, where it would be great to have a COW-backed virtual
memory area within an userspace process. I know there's the possibility
to have a file-backed MAP_SHARED vma, which is later duplicated with
MAP_PRIVATE, but that's not exactly what I'm looking for.

Say I have an anonymous page-aligned VMA a, with MAP_PRIVATE and
PROT_RW. Userspace happily writes to/reads from it. At some point in
time, I want to 'snapshot' that single VMA within the context of the
process and without the need to fork(). Say there's something like

    a = mmap(0, len, PROT_RW, MAP_ANON | MAP_POPULATE, -1, 0);
    [... fill a ...]

    b = mmdup(a, len, PROT_READ);

b shall be the new base pointer of a new VMA that is backed by COW
mechanisms. After mmdup, those regular COW mechanisms do the rest: both
VMAs (a and b) will fault on subsequent writes and duplicate the
previously shared physical mapping, pretty much what cow_fault or
shared_fault does.

Afaict, this, or at least something like this is currently not supported
by the kernel. Is that correct? If so, why? Generally spoken, is it a
bad idea?

Not sure if it helps (most probably not), QEMU uses uffd-wp for
background snapshots of VM memory. It's different, though, as you'll
only have a single mapping and will be catching modifications to your
single mapping, such that you can "safe away" relevant snapshot pages
before any modifications.

Thanks for the pointer, David. I'll have a look.


You mention "both VMAs (a and b) will fault on subsequent writes", so
would you actually be allowing PROT_WRITE access to b ("snapshot")?


In general, yes, both should be allowed to be PROT_WRITE. So no matter
"which side" causes the fault, simply both will lead to duplication.

If it would make things easier, then it would also be absolutely fine to
have the snapshot PROT_READ, which would suffice my requirements as well.

I recall that Redis has very similar requirements for live snapshotting. They used to handle it via fork() just as you described as I was told. I don't know if they already switched to uffd-wp, but I would guess they already did, because they were another excellent use case for uffd-wp

https://lists.gnu.org/archive/html/qemu-devel/2016-10/msg02955.html

You can handle COW manually in user space that way

1. Creating a second anonymous mapping
2. Registering a UFFD-WP handler on the original mapping
3. WP-protecting the original mapping via UFFD
4. Tracking in a bitmap which pages were already copied

So when you get notified about a WP event, you copy the page manually to the second mapping, un-protect the page, and remember in the bitmap that the page has been copied.

When reading the snapshot, you have to take a look at the bitmap to figure out if you have to read a specific page from the original, or from the second mapping. But you won't be able to just read the second mapping. (question would be, if that is really required or can be worked-around)

--
Thanks,

David / dhildenb






[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux