Re: [PATCH v2] mm: improve mprotect(R|W) efficiency on pages referenced once

Peter Collingbourne <pcc@xxxxxxxxxx> · Tue, 19 Jan 2021 09:17:13 -0800

On Tue, Dec 29, 2020 at 4:41 PM Peter Collingbourne <pcc@xxxxxxxxxx> wrote:
>
> In the Scudo memory allocator [1] we would like to be able to
> detect use-after-free vulnerabilities involving large allocations
> by issuing mprotect(PROT_NONE) on the memory region used for the
> allocation when it is deallocated. Later on, after the memory
> region has been "quarantined" for a sufficient period of time we
> would like to be able to use it for another allocation by issuing
> mprotect(PROT_READ|PROT_WRITE).
>
> Before this patch, after removing the write protection, any writes
> to the memory region would result in page faults and entering
> the copy-on-write code path, even in the usual case where the
> pages are only referenced by a single PTE, harming performance
> unnecessarily. Make it so that any pages in anonymous mappings that
> are only referenced by a single PTE are immediately made writable
> during the mprotect so that we can avoid the page faults.
>
> This program shows the critical syscall sequence that we intend to
> use in the allocator:
>
>   #include <string.h>
>   #include <sys/mman.h>
>
>   enum { kSize = 131072 };
>
>   int main(int argc, char **argv) {
>     char *addr = (char *)mmap(0, kSize, PROT_READ | PROT_WRITE,
>                               MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
>     for (int i = 0; i != 100000; ++i) {
>       memset(addr, i, kSize);
>       mprotect((void *)addr, kSize, PROT_NONE);
>       mprotect((void *)addr, kSize, PROT_READ | PROT_WRITE);
>     }
>   }
>
> The effect of this patch on the above program was measured on a
> DragonBoard 845c by taking the median real time execution time of
> 10 runs.
>
> Before: 3.19s
> After:  0.79s
>
> The effect was also measured using one of the microbenchmarks that
> we normally use to benchmark the allocator [2], after modifying it
> to make the appropriate mprotect calls [3]. With an allocation size
> of 131072 bytes to trigger the allocator's "large allocation" code
> path the per-iteration time was measured as follows:
>
> Before: 33364ns
> After:   6886ns
>
> This patch means that we do more work during the mprotect call itself
> in exchange for less work when the pages are accessed. In the worst
> case, the pages are not accessed at all. The effect of this patch in
> such cases was measured using the following program:
>
>   #include <string.h>
>   #include <sys/mman.h>
>
>   enum { kSize = 131072 };
>
>   int main(int argc, char **argv) {
>     char *addr = (char *)mmap(0, kSize, PROT_READ | PROT_WRITE,
>                               MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
>     memset(addr, 1, kSize);
>     for (int i = 0; i != 100000; ++i) {
>   #ifdef PAGE_FAULT
>       memset(addr + (i * 4096) % kSize, i, 4096);
>   #endif
>       mprotect((void *)addr, kSize, PROT_NONE);
>       mprotect((void *)addr, kSize, PROT_READ | PROT_WRITE);
>     }
>   }
>
> With PAGE_FAULT undefined (0 pages touched after removing write
> protection) the median real time execution time of 100 runs was
> measured as follows:
>
> Before: 0.325928s
> After:  0.365493s
>
> With PAGE_FAULT defined (1 page touched) the measurements were
> as follows:
>
> Before: 0.441516s
> After:  0.380251s
>
> So it seems that even with a single page fault the new approach
> is faster.
>
> I saw similar results if I adjusted the programs to use a larger
> mapping size. With kSize = 1048576 I get these numbers with PAGE_FAULT
> undefined:
>
> Before: 1.563078s
> After:  1.607476s
>
> i.e. around 3%.
>
> And these with PAGE_FAULT defined:
>
> Before: 1.684663s
> After:  1.683272s
>
> i.e. about the same.
>
> What I think we may conclude from these results is that for smaller
> mappings the advantage of the previous approach, although measurable,
> is wiped out by a single page fault. I think we may expect that there
> should be at least one access resulting in a page fault (under the
> previous approach) after making the pages writable, since the program
> presumably made the pages writable for a reason.
>
> For larger mappings we may guesstimate that the new approach wins if
> the density of future page faults is > 0.4%. But for the mappings that
> are large enough for density to matter (not just the absolute number
> of page faults) it doesn't seem like the increase in mprotect latency
> would be very large relative to the total mprotect execution time.
>
> Signed-off-by: Peter Collingbourne <pcc@xxxxxxxxxx>
> Link: https://linux-review.googlesource.com/id/I98d75ef90e20330c578871c87494d64b1df3f1b8
> Link: [1] https://source.android.com/devices/tech/debug/scudo
> Link: [2] https://cs.android.com/android/platform/superproject/+/master:bionic/benchmarks/stdlib_benchmark.cpp;l=53;drc=e8693e78711e8f45ccd2b610e4dbe0b94d551cc9
> Link: [3] https://github.com/pcc/llvm-project/commit/scudo-mprotect-secondary

Hi Andrew,

We had some test failures on Android that were tracked down to this
patch, so it should probably be backed out of -mm until the problem is
resolved.

Peter