Re: [PATCH v2] mm: improve mprotect(R|W) efficiency on pages referenced once

Peter Collingbourne <pcc@xxxxxxxxxx> · Thu, 29 Apr 2021 14:44:15 -0700

On Tue, Jan 19, 2021 at 9:17 AM Peter Collingbourne <pcc@xxxxxxxxxx> wrote:
>
> On Tue, Dec 29, 2020 at 4:41 PM Peter Collingbourne <pcc@xxxxxxxxxx> wrote:
> >
> > In the Scudo memory allocator [1] we would like to be able to
> > detect use-after-free vulnerabilities involving large allocations
> > by issuing mprotect(PROT_NONE) on the memory region used for the
> > allocation when it is deallocated. Later on, after the memory
> > region has been "quarantined" for a sufficient period of time we
> > would like to be able to use it for another allocation by issuing
> > mprotect(PROT_READ|PROT_WRITE).
> >
> > Before this patch, after removing the write protection, any writes
> > to the memory region would result in page faults and entering
> > the copy-on-write code path, even in the usual case where the
> > pages are only referenced by a single PTE, harming performance
> > unnecessarily. Make it so that any pages in anonymous mappings that
> > are only referenced by a single PTE are immediately made writable
> > during the mprotect so that we can avoid the page faults.
> >
> > This program shows the critical syscall sequence that we intend to
> > use in the allocator:
> >
> >   #include <string.h>
> >   #include <sys/mman.h>
> >
> >   enum { kSize = 131072 };
> >
> >   int main(int argc, char **argv) {
> >     char *addr = (char *)mmap(0, kSize, PROT_READ | PROT_WRITE,
> >                               MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
> >     for (int i = 0; i != 100000; ++i) {
> >       memset(addr, i, kSize);
> >       mprotect((void *)addr, kSize, PROT_NONE);
> >       mprotect((void *)addr, kSize, PROT_READ | PROT_WRITE);
> >     }
> >   }
> >
> > The effect of this patch on the above program was measured on a
> > DragonBoard 845c by taking the median real time execution time of
> > 10 runs.
> >
> > Before: 3.19s
> > After:  0.79s
> >
> > The effect was also measured using one of the microbenchmarks that
> > we normally use to benchmark the allocator [2], after modifying it
> > to make the appropriate mprotect calls [3]. With an allocation size
> > of 131072 bytes to trigger the allocator's "large allocation" code
> > path the per-iteration time was measured as follows:
> >
> > Before: 33364ns
> > After:   6886ns
> >
> > This patch means that we do more work during the mprotect call itself
> > in exchange for less work when the pages are accessed. In the worst
> > case, the pages are not accessed at all. The effect of this patch in
> > such cases was measured using the following program:
> >
> >   #include <string.h>
> >   #include <sys/mman.h>
> >
> >   enum { kSize = 131072 };
> >
> >   int main(int argc, char **argv) {
> >     char *addr = (char *)mmap(0, kSize, PROT_READ | PROT_WRITE,
> >                               MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
> >     memset(addr, 1, kSize);
> >     for (int i = 0; i != 100000; ++i) {
> >   #ifdef PAGE_FAULT
> >       memset(addr + (i * 4096) % kSize, i, 4096);
> >   #endif
> >       mprotect((void *)addr, kSize, PROT_NONE);
> >       mprotect((void *)addr, kSize, PROT_READ | PROT_WRITE);
> >     }
> >   }
> >
> > With PAGE_FAULT undefined (0 pages touched after removing write
> > protection) the median real time execution time of 100 runs was
> > measured as follows:
> >
> > Before: 0.325928s
> > After:  0.365493s
> >
> > With PAGE_FAULT defined (1 page touched) the measurements were
> > as follows:
> >
> > Before: 0.441516s
> > After:  0.380251s
> >
> > So it seems that even with a single page fault the new approach
> > is faster.
> >
> > I saw similar results if I adjusted the programs to use a larger
> > mapping size. With kSize = 1048576 I get these numbers with PAGE_FAULT
> > undefined:
> >
> > Before: 1.563078s
> > After:  1.607476s
> >
> > i.e. around 3%.
> >
> > And these with PAGE_FAULT defined:
> >
> > Before: 1.684663s
> > After:  1.683272s
> >
> > i.e. about the same.
> >
> > What I think we may conclude from these results is that for smaller
> > mappings the advantage of the previous approach, although measurable,
> > is wiped out by a single page fault. I think we may expect that there
> > should be at least one access resulting in a page fault (under the
> > previous approach) after making the pages writable, since the program
> > presumably made the pages writable for a reason.
> >
> > For larger mappings we may guesstimate that the new approach wins if
> > the density of future page faults is > 0.4%. But for the mappings that
> > are large enough for density to matter (not just the absolute number
> > of page faults) it doesn't seem like the increase in mprotect latency
> > would be very large relative to the total mprotect execution time.
> >
> > Signed-off-by: Peter Collingbourne <pcc@xxxxxxxxxx>
> > Link: https://linux-review.googlesource.com/id/I98d75ef90e20330c578871c87494d64b1df3f1b8
> > Link: [1] https://source.android.com/devices/tech/debug/scudo
> > Link: [2] https://cs.android.com/android/platform/superproject/+/master:bionic/benchmarks/stdlib_benchmark.cpp;l=53;drc=e8693e78711e8f45ccd2b610e4dbe0b94d551cc9
> > Link: [3] https://github.com/pcc/llvm-project/commit/scudo-mprotect-secondary
>
> Hi Andrew,
>
> We had some test failures on Android that were tracked down to this
> patch, so it should probably be backed out of -mm until the problem is
> resolved.

I was finally able to find some time to debug this. The issue turned
out to be caused by a missing check that the page is dirty, as in the
previous if statement. This at least seems necessary in order to
ensure that softdirty continues to be handled correctly. My fix was to
merge the check into the previous if statement, and with that I ran
the same Android test suite that failed last time and didn't see any
failures. I also reran my performance measurements, and they are in
line with what I previously measured. I will post a v3 with the fix
and the new measurement results.

Peter