Re: [patch 075/200] mm: speedup mremap on 1GB or larger regions

Kalesh Singh <kaleshsingh@xxxxxxxxxx> · Tue, 15 Dec 2020 18:16:18 -0500

On Tue, Dec 15, 2020 at 2:59 PM Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> On Mon, Dec 14, 2020 at 7:07 PM Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
> >
> > From: Kalesh Singh <kaleshsingh@xxxxxxxxxx>
> > Subject: mm: speedup mremap on 1GB or larger regions
> >
> > Android needs to move large memory regions for garbage collection.  The GC
> > requires moving physical pages of multi-gigabyte heap using mremap.
> > During this move, the application threads have to be paused for
> > correctness.  It is critical to keep this pause as short as possible to
> > avoid jitters during user interaction.
>
> It would have been good to add a pointer to the PMD case we did earlier..
>
> Also, a few comments on the actual performance in practice would be
> nice. Does this actually *trigger* on Android in practice?
>
> I can well imagine the PMD case triggering easily, but are there
> real-life Android loads that really do gigabyte heaps? That sounds a
> bit odd to me.
>
> So I don't have any complaints about the patch, but I just wonder how
> _realistic_ this actually is, particularly the alleged 13x improvement
> in timing...
>
Hi Linus,

The new GC for Android requires moving the Java heap pages to a
separate location during a stop-the-world pause (when application
threads are paused). Once this is done, with the help of userfaultfd,
the heap can be concurrently compacted while application threads are
making progress.

Given that Android apps are highly susceptible to response time, we
need this pause to be as small as possible (not more than a few
microseconds). The dominating factor during this pause is the mremap
operation and therefore this optimization is essential.

As of today, the Java heaps on Android are up to 1GB in virtual memory
size. However, the new GC algorithm makes it multi-gigabyte. Even
though the heap consumption in terms of physical memory is not going
to be more than a few hundred MBs, the physical pages will be
scattered across the entire virtual memory range. Therefore, this
optimization will reduce the number of iterations required to finish
the mremap operation.

Example scenario:
Let's say that our heap is 8GB in virtual memory size and 128MB (a
realistic heap occupancy) in physical memory size at the time of a
mremap operation. That means we have 32K 4KB physical pages in there.

A) Worst case scenario: every 4KB physical page is mapped to a unique PMD entry
1) With only the PMD optimization the loop will make 32K iterations.
2) With the PUD optimization the loop will make 8 iterations.

B) Best case scenario: all pages are compacted in one corner of the heap, then
1) With only the PMD optimization the loop will make 64 iterations.
2) With the PUD optimization the loop will make only 1 iteration.

Even in the best case we are reducing the number of iterations by 64x.
The optimization will be useful even if not all of the virtual address
range is mapped.

Thanks,
Kalesh

>              Linus