Re: [RFC 0/7] introduce memory hinting API for external process

Anshuman Khandual <anshuman.khandual@xxxxxxx> · Tue, 21 May 2019 08:25:55 +0530

On 05/20/2019 10:29 PM, Tim Murray wrote:
> On Sun, May 19, 2019 at 11:37 PM Anshuman Khandual
> <anshuman.khandual@xxxxxxx> wrote:
>>
>> Or Is the objective here is reduce the number of processes which get killed by
>> lmkd by triggering swapping for the unused memory (user hinted) sooner so that
>> they dont get picked by lmkd. Under utilization for zram hardware is a concern
>> here as well ?
> 
> The objective is to avoid some instances of memory pressure by
> proactively swapping pages that userspace knows to be cold before
> those pages reach the end of the LRUs, which in turn can prevent some
> apps from being killed by lmk/lmkd. As soon as Android userspace knows
> that an application is not being used and is only resident to improve
> performance if the user returns to that app, we can kick off
> process_madvise on that process's pages (or some portion of those
> pages) in a power-efficient way to reduce memory pressure long before
> the system hits the free page watermark. This allows the system more
> time to put pages into zram versus waiting for the watermark to
> trigger kswapd, which decreases the likelihood that later memory
> allocations will cause enough pressure to trigger a kill of one of
> these apps.

So this opens up bit of LRU management to user space hints. Also because the app
in itself wont know about the memory situation of the entire system, new system
call needs to be called from an external process.

> 
>> Swapping out memory into zram wont increase the latency for a hot start ? Or
>> is it because as it will prevent a fresh cold start which anyway will be slower
>> than a slow hot start. Just being curious.
> 
> First, not all swapped pages will be reloaded immediately once an app
> is resumed. We've found that an app's working set post-process_madvise
> is significantly smaller than what an app allocates when it first
> launches (see the delta between pswpin and pswpout in Minchan's
> results). Presumably because of this, faulting to fetch from zram does

pswpin      417613    1392647     975034     233.00
pswpout    1274224    2661731    1387507     108.00

IIUC the swap-in ratio is way higher in comparison to that of swap out. Is that
always the case ? Or it tend to swap out from an active area of the working set
which faulted back again.

> not seem to introduce a noticeable hot start penalty, not does it
> cause an increase in performance problems later in the app's
> lifecycle. I've measured with and without process_madvise, and the
> differences are within our noise bounds. Second, because we're not

That is assuming that post process_madvise() working set for the application is
always smaller. There is another challenge. The external process should ideally
have the knowledge of active areas of the working set for an application in
question for it to invoke process_madvise() correctly to prevent such scenarios.

> preemptively evicting file pages and only making them more likely to
> be evicted when there's already memory pressure, we avoid the case
> where we process_madvise an app then immediately return to the app and
> reload all file pages in the working set even though there was no
> intervening memory pressure. Our initial version of this work evicted

That would be the worst case scenario which should be avoided. Memory pressure
must be a parameter before actually doing the swap out. But pages if know to be
inactive/cold can be marked high priority to be swapped out.

> file pages preemptively and did cause a noticeable slowdown (~15%) for
> that case; this patch set avoids that slowdown. Finally, the benefit
> from avoiding cold starts is huge. The performance improvement from
> having a hot start instead of a cold start ranges from 3x for very
> small apps to 50x+ for larger apps like high-fidelity games.

Is there any other real world scenario apart from this app based ecosystem where
user hinted LRU management might be helpful ? Just being curious. Thanks for the
detailed explanation. I will continue looking into this series.