On Tue, May 21, 2019 at 08:25:55AM +0530, Anshuman Khandual wrote: > > > On 05/20/2019 10:29 PM, Tim Murray wrote: > > On Sun, May 19, 2019 at 11:37 PM Anshuman Khandual > > <anshuman.khandual@xxxxxxx> wrote: > >> > >> Or Is the objective here is reduce the number of processes which get killed by > >> lmkd by triggering swapping for the unused memory (user hinted) sooner so that > >> they dont get picked by lmkd. Under utilization for zram hardware is a concern > >> here as well ? > > > > The objective is to avoid some instances of memory pressure by > > proactively swapping pages that userspace knows to be cold before > > those pages reach the end of the LRUs, which in turn can prevent some > > apps from being killed by lmk/lmkd. As soon as Android userspace knows > > that an application is not being used and is only resident to improve > > performance if the user returns to that app, we can kick off > > process_madvise on that process's pages (or some portion of those > > pages) in a power-efficient way to reduce memory pressure long before > > the system hits the free page watermark. This allows the system more > > time to put pages into zram versus waiting for the watermark to > > trigger kswapd, which decreases the likelihood that later memory > > allocations will cause enough pressure to trigger a kill of one of > > these apps. > > So this opens up bit of LRU management to user space hints. Also because the app > in itself wont know about the memory situation of the entire system, new system > call needs to be called from an external process. That's why process_madvise is introduced here. > > > > >> Swapping out memory into zram wont increase the latency for a hot start ? Or > >> is it because as it will prevent a fresh cold start which anyway will be slower > >> than a slow hot start. Just being curious. > > > > First, not all swapped pages will be reloaded immediately once an app > > is resumed. We've found that an app's working set post-process_madvise > > is significantly smaller than what an app allocates when it first > > launches (see the delta between pswpin and pswpout in Minchan's > > results). Presumably because of this, faulting to fetch from zram does > > pswpin 417613 1392647 975034 233.00 > pswpout 1274224 2661731 1387507 108.00 > > IIUC the swap-in ratio is way higher in comparison to that of swap out. Is that > always the case ? Or it tend to swap out from an active area of the working set > which faulted back again. I think it's because apps are alive longer via reducing being killed so turn into from pgpgin to swapin. > > > not seem to introduce a noticeable hot start penalty, not does it > > cause an increase in performance problems later in the app's > > lifecycle. I've measured with and without process_madvise, and the > > differences are within our noise bounds. Second, because we're not > > That is assuming that post process_madvise() working set for the application is > always smaller. There is another challenge. The external process should ideally > have the knowledge of active areas of the working set for an application in > question for it to invoke process_madvise() correctly to prevent such scenarios. There are several ways to detect workingset more accurately at the cost of runtime. For example, with idle page tracking or clear_refs. Accuracy is always trade-off of overhead for LRU aging. > > > preemptively evicting file pages and only making them more likely to > > be evicted when there's already memory pressure, we avoid the case > > where we process_madvise an app then immediately return to the app and > > reload all file pages in the working set even though there was no > > intervening memory pressure. Our initial version of this work evicted > > That would be the worst case scenario which should be avoided. Memory pressure > must be a parameter before actually doing the swap out. But pages if know to be > inactive/cold can be marked high priority to be swapped out. > > > file pages preemptively and did cause a noticeable slowdown (~15%) for > > that case; this patch set avoids that slowdown. Finally, the benefit > > from avoiding cold starts is huge. The performance improvement from > > having a hot start instead of a cold start ranges from 3x for very > > small apps to 50x+ for larger apps like high-fidelity games. > > Is there any other real world scenario apart from this app based ecosystem where > user hinted LRU management might be helpful ? Just being curious. Thanks for the > detailed explanation. I will continue looking into this series.