Re: [PATCH v3] mm: add mremap flag for preserving the old mapping

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sep 30, 2014 2:36 AM, "Daniel Micay" <danielmicay@xxxxxxxxx> wrote:
>
> On 30/09/14 01:53 AM, Andy Lutomirski wrote:
> > On Mon, Sep 29, 2014 at 9:55 PM, Daniel Micay <danielmicay@xxxxxxxxx> wrote:
> >> This introduces the MREMAP_RETAIN flag for preserving the source mapping
> >> when MREMAP_MAYMOVE moves the pages to a new destination. Accesses to
> >> the source location will fault and cause fresh pages to be mapped in.
> >>
> >> For consistency, the old_len >= new_len case could decommit the pages
> >> instead of unmapping. However, userspace can accomplish the same thing
> >> via madvise and a coherent definition of the flag is possible without
> >> the extra complexity.
> >
> > IMO this needs very clear documentation of exactly what it does.
>
> Agreed, and thanks for the review. I'll post a slightly modified version
> of the patch soon (mostly more commit message changes).
>
> > Does it preserve the contents of the source pages?  (If so, why?
> > Aren't you wasting a bunch of time on page faults and possibly
> > unnecessary COWs?)
>
> The source will act as if it was just created. For an anonymous memory
> mapping, it will fault on any accesses and bring in new zeroed pages.
>
> In jemalloc, it replaces an enormous memset(dst, src, size) followed by
> madvise(src, size, MADV_DONTNEED) with mremap. Using mremap also ends up
> eliding page faults from writes at the destination.
>
> TCMalloc has nearly the same page allocation design, although it tries
> to throttle the purging so it won't always gain as much.
>
> > Does it work on file mappings?  Can it extend file mappings while it moves them?
>
> It works on file mappings. If a move occurs, there will be the usual
> extended destination mapping but with the source mapping left intact.
>
> It wouldn't be useful with existing allocators, but in theory a general
> purpose allocator could expose an MMIO API in order to reuse the same
> address space via MAP_FIXED/MREMAP_FIXED to reduce VM fragmentation.
>
> > If you MREMAP_RETAIN a partially COWed private mapping, what happens?
>
> The original mapping is zeroed in the following test, as it would be
> without fork:
>
> #define _GNU_SOURCE
>
> #include <string.h>
> #include <stdlib.h>
> #include <sys/mman.h>
> #include <unistd.h>
> #include <sys/wait.h>
>
> int main(void) {
>   size_t size = 1024 * 1024;
>   char *orig = mmap(NULL, size, PROT_READ|PROT_WRITE,
>                     MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
>   memset(orig, 5, size);
>   int pid = fork();
>   if (pid == -1)
>     return 1;
>   if (pid == 0) {
>     memset(orig, 5, 1024);
>     char *new = mremap(orig, size, size * 128, MREMAP_MAYMOVE|4);
>     if (new == orig) return 1;
>     for (size_t i = 0; i < size; i++)
>       if (new[i] != 5)
>         return 1;
>     for (size_t i = 0; i < size; i++)
>       if (orig[i] != 0)
>         return 1;
>     return 0;
>   }
>   int status;
>   if (wait(&status) < -1) return 1;
>   if (WIFEXITED(status))
>     return WEXITSTATUS(status);
>   return 1;
> }
>
> Hopefully this is the case you're referring to. :)

What about private file mappings?

>
> > Does it work on special mappings?  If so, please prevent it from doing
> > so.  mremapping x86's vdso is a thing, and duplicating x86's vdso
> > should not become a thing, because x86_32 in particular will become
> > extremely confused.
>
> I'll add a check for arch_vma_name(vma) == NULL.

Careful!  That function is deprecated in favor of vm_ops->name.

I think it might pay to add an explicit vm_op to authorize
duplication, especially for non-cow mappings.  IOW this kind of
extension seems quite magical for anything that doesn't have the
normal COW semantics, including for plain old read-only mappings.

>
> There's an existing check for VM_DONTEXPAND | VM_PFNMAP when expanding
> allocations (the only case this flag impacts). Are there other kinds of
> special mappings that you're referring to?

I was referring to special mappings in the install_special_mapping
sense.  Those may or may not have VM_PFNMAP set.

If VM_DONTEXPAND blocks this new feature entirely, that's probably good.

--Andy

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]