Re: + mm-improve-mprotectrw-efficiency-on-pages-referenced-once.patch added to -mm tree

Peter Collingbourne <pcc@xxxxxxxxxx> · Tue, 1 Jun 2021 12:02:31 -0700

On Mon, May 31, 2021 at 5:53 PM Peter Xu <peterx@xxxxxxxxxx> wrote:
>
> On Mon, May 31, 2021 at 05:40:09PM -0700, akpm@xxxxxxxxxxxxxxxxxxxx wrote:
> >
> > The patch titled
> >      Subject: mm: improve mprotect(R|W) efficiency on pages referenced once
> > has been added to the -mm tree.  Its filename is
> >      mm-improve-mprotectrw-efficiency-on-pages-referenced-once.patch
> >
> > This patch should soon appear at
> >     https://ozlabs.org/~akpm/mmots/broken-out/mm-improve-mprotectrw-efficiency-on-pages-referenced-once.patch
> > and later at
> >     https://ozlabs.org/~akpm/mmotm/broken-out/mm-improve-mprotectrw-efficiency-on-pages-referenced-once.patch
> >
> > Before you just go and hit "reply", please:
> >    a) Consider who else should be cc'ed
> >    b) Prefer to cc a suitable mailing list as well
> >    c) Ideally: find the original patch on the mailing list and do a
> >       reply-to-all to that, adding suitable additional cc's
> >
> > *** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
> >
> > The -mm tree is included into linux-next and is updated
> > there every 3-4 working days
> >
> > ------------------------------------------------------
> > From: Peter Collingbourne <pcc@xxxxxxxxxx>
> > Subject: mm: improve mprotect(R|W) efficiency on pages referenced once
> >
> > In the Scudo memory allocator [1] we would like to be able to detect
> > use-after-free vulnerabilities involving large allocations by issuing
> > mprotect(PROT_NONE) on the memory region used for the allocation when it
> > is deallocated.  Later on, after the memory region has been "quarantined"
> > for a sufficient period of time we would like to be able to use it for
> > another allocation by issuing mprotect(PROT_READ|PROT_WRITE).
> >
> > Before this patch, after removing the write protection, any writes to the
> > memory region would result in page faults and entering the copy-on-write
> > code path, even in the usual case where the pages are only referenced by a
> > single PTE, harming performance unnecessarily.  Make it so that any pages
> > in anonymous mappings that are only referenced by a single PTE are
> > immediately made writable during the mprotect so that we can avoid the
> > page faults.
> >
> > This program shows the critical syscall sequence that we intend to use in
> > the allocator:
> >
> >   #include <string.h>
> >   #include <sys/mman.h>
> >
> >   enum { kSize = 131072 };
> >
> >   int main(int argc, char **argv) {
> >     char *addr = (char *)mmap(0, kSize, PROT_READ | PROT_WRITE,
> >                               MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
> >     for (int i = 0; i != 100000; ++i) {
> >       memset(addr, i, kSize);
> >       mprotect((void *)addr, kSize, PROT_NONE);
> >       mprotect((void *)addr, kSize, PROT_READ | PROT_WRITE);
> >     }
> >   }
> >
> > The effect of this patch on the above program was measured on a
> > DragonBoard 845c by taking the median real time execution time of 10 runs.
> >
> > Before: 2.94s
> > After:  0.66s
> >
> > The effect was also measured using one of the microbenchmarks that we
> > normally use to benchmark the allocator [2], after modifying it to make
> > the appropriate mprotect calls [3].  With an allocation size of 131072
> > bytes to trigger the allocator's "large allocation" code path the
> > per-iteration time was measured as follows:
> >
> > Before: 27450ns
> > After:   6010ns
> >
> > This patch means that we do more work during the mprotect call itself in
> > exchange for less work when the pages are accessed.  In the worst case,
> > the pages are not accessed at all.  The effect of this patch in such cases
> > was measured using the following program:
> >
> >   #include <string.h>
> >   #include <sys/mman.h>
> >
> >   enum { kSize = 131072 };
> >
> >   int main(int argc, char **argv) {
> >     char *addr = (char *)mmap(0, kSize, PROT_READ | PROT_WRITE,
> >                               MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
> >     memset(addr, 1, kSize);
> >     for (int i = 0; i != 100000; ++i) {
> >   #ifdef PAGE_FAULT
> >       memset(addr + (i * 4096) % kSize, i, 4096);
> >   #endif
> >       mprotect((void *)addr, kSize, PROT_NONE);
> >       mprotect((void *)addr, kSize, PROT_READ | PROT_WRITE);
> >     }
> >   }
> >
> > With PAGE_FAULT undefined (0 pages touched after removing write
> > protection) the median real time execution time of 100 runs was measured
> > as follows:
> >
> > Before: 0.330260s
> > After:  0.338836s
> >
> > With PAGE_FAULT defined (1 page touched) the measurements were
> > as follows:
> >
> > Before: 0.438048s
> > After:  0.355661s
> >
> > So it seems that even with a single page fault the new approach is faster.
> >
> > I saw similar results if I adjusted the programs to use a larger mapping
> > size.  With kSize = 1048576 I get these numbers with PAGE_FAULT undefined:
> >
> > Before: 1.428988s
> > After:  1.512016s
> >
> > i.e. around 5.5%.
> >
> > And these with PAGE_FAULT defined:
> >
> > Before: 1.518559s
> > After:  1.524417s
> >
> > i.e. about the same.
> >
> > What I think we may conclude from these results is that for smaller
> > mappings the advantage of the previous approach, although measurable, is
> > wiped out by a single page fault.  I think we may expect that there should
> > be at least one access resulting in a page fault (under the previous
> > approach) after making the pages writable, since the program presumably
> > made the pages writable for a reason.
> >
> > For larger mappings we may guesstimate that the new approach wins if the
> > density of future page faults is > 0.4%.  But for the mappings that are
> > large enough for density to matter (not just the absolute number of page
> > faults) it doesn't seem like the increase in mprotect latency would be
> > very large relative to the total mprotect execution time.
> >
> > Link: https://lkml.kernel.org/r/20210527190453.1259020-1-pcc@xxxxxxxxxx
> > Link: https://linux-review.googlesource.com/id/I98d75ef90e20330c578871c87494d64b1df3f1b8
> > Link: [1] https://source.android.com/devices/tech/debug/scudo
> > Link: [2] https://cs.android.com/android/platform/superproject/+/master:bionic/benchmarks/stdlib_benchmark.cpp;l=53;drc=e8693e78711e8f45ccd2b610e4dbe0b94d551cc9
> > Link: [3] https://github.com/pcc/llvm-project/commit/scudo-mprotect-secondary2
> > Signed-off-by: Peter Collingbourne <pcc@xxxxxxxxxx>
> > Reviewed-by: Peter Xu <peterx@xxxxxxxxxx>
> > Cc: Kostya Kortchinsky <kostyak@xxxxxxxxxx>
> > Cc: Evgenii Stepanov <eugenis@xxxxxxxxxx>
> > Cc: Andrea Arcangeli <aarcange@xxxxxxxxxx>
> > Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
> > ---
> >
> >  mm/mprotect.c |   29 ++++++++++++++++++++++++-----
> >  1 file changed, 24 insertions(+), 5 deletions(-)
> >
> > --- a/mm/mprotect.c~mm-improve-mprotectrw-efficiency-on-pages-referenced-once
> > +++ a/mm/mprotect.c
> > @@ -35,6 +35,29 @@
> >
> >  #include "internal.h"
> >
> > +static bool may_avoid_write_fault(pte_t pte, struct vm_area_struct *vma,
> > +                               unsigned long cp_flags)
> > +{
>
> Andrew,
>
> As raised in the original patch, IMHO we'd better add below:
>
>         if (cp_flags & MM_CP_PROT_NUMA)
>                 return false;
>
> Peter Collingbourne, could you confirm?

Done in v5, and added some comments as requested by Andrew on the other thread.

Peter