On Fri, Nov 6, 2020 at 12:24 AM Johannes Weiner <hannes@xxxxxxxxxxx> wrote: > > On Thu, Nov 05, 2020 at 09:10:12PM +0800, Yafang Shao wrote: > > The memory utilization (Used / Total) is used to monitor the memory > > pressure by us. If it is too high, it means the system may be OOM sooner > > or later when swap is off, then we will make adjustment on this system. > > > > However, this method is broken since MADV_FREE is introduced, because > > these lazily free anonymous can be reclaimed under memory pressure while > > they are still accounted in NR_ANON_MAPPED. > > > > Furthermore, since commit f7ad2a6cb9f7 ("mm: move MADV_FREE pages into > > LRU_INACTIVE_FILE list"), these lazily free anonymous pages are moved > > from anon lru list into file lru list. That means > > (Inactive(file) + Active(file)) may be much larger than Cached in > > /proc/meminfo. That makes our users confused. > > > > So we'd better account the lazily freed anonoymous pages in > > NR_FILE_PAGES as well. > > What about the share of pages that have been reused? After all, the > idea behind deferred reclaim is cheap reuse of already allocated and > faulted in pages. > I missed the reuse case. Thanks for the explanation. > Anywhere between 0% and 100% of MADV_FREEd pages may be dirty and need > swap-out to reclaim. That means even after this patch, your formula > would still have an error margin of 100%. > > The tradeoff with saving the reuse fault and relying on the MMU is > that the kernel simply *cannot do* lazy free accounting. Userspace > needs to do it. E.g. if a malloc implementation or similar uses > MADV_FREE, it has to keep track of what is and isn't used and make > those stats available. > > If that's not practical, That is not practical. The process which uses MADV_FREE can keep track of it, but other processes like monitor tools have no easier way to keep track of it. We can't give the userspace trouble. > I don't see an alternative to trapping minor > faults upon page reuse, eating the additional TLB flush, and doing the > accounting properly inside the kernel. > I will try to analyze the details and find whether there is some way to track it in the kernel. > > @@ -1312,8 +1312,13 @@ static void page_remove_anon_compound_rmap(struct page *page) > > if (unlikely(PageMlocked(page))) > > clear_page_mlock(page); > > > > - if (nr) > > - __mod_lruvec_page_state(page, NR_ANON_MAPPED, -nr); > > + if (nr) { > > + if (PageLRU(page) && PageAnon(page) && !PageSwapBacked(page) && > > + !PageSwapCache(page) && !PageUnevictable(page)) > > + __mod_lruvec_page_state(page, NR_FILE_PAGES, -nr); > > + else > > + __mod_lruvec_page_state(page, NR_ANON_MAPPED, -nr); > > I don't think this would work. The page can be temporarily off-LRU for > compaction, migration, reclaim etc. and then you'd misaccount it here. Right, thanks for the clarification. -- Thanks Yafang