On Thu, Nov 5, 2020 at 9:35 PM Michal Hocko <mhocko@xxxxxxxx> wrote: > > On Thu 05-11-20 21:10:12, Yafang Shao wrote: > > The memory utilization (Used / Total) is used to monitor the memory > > pressure by us. If it is too high, it means the system may be OOM sooner > > or later when swap is off, then we will make adjustment on this system. > > > > However, this method is broken since MADV_FREE is introduced, because > > these lazily free anonymous can be reclaimed under memory pressure while > > they are still accounted in NR_ANON_MAPPED. > > > > Furthermore, since commit f7ad2a6cb9f7 ("mm: move MADV_FREE pages into > > LRU_INACTIVE_FILE list"), these lazily free anonymous pages are moved > > from anon lru list into file lru list. That means > > (Inactive(file) + Active(file)) may be much larger than Cached in > > /proc/meminfo. That makes our users confused. > > > > So we'd better account the lazily freed anonoymous pages in > > NR_FILE_PAGES as well. > > Can you simply subtract lazyfree pages in the userspace? Could you pls. tell me how to subtract lazyfree pages in the userspace? Pls. note that we can't use (pglazyfree - pglazyfreed) because pglazyfreed is only counted in the regular reclaim path while the process exit path is not counted, that means we have to introduce another counter like LazyPage.... > I am afraid your > patch just makes the situation even more muddy. NR_ANON_MAPPED is really > meant to tell how many anonymous pages are mapped. And MADV_FREE pages > are mapped until they are freed. NR_*_FILE are reflecting size of LRU > lists and NR_FILE_PAGES reflects the number of page cache pages but > madvfree pages are not a page cache. They are aged together with file > pages but they are not the same thing. Same like shmem pages are page > cache that is living on anon LRUs. > > Confusing? Tricky? Yes, likely. But I do not think we want to bend those > counters even further. > > > Signed-off-by: Yafang Shao <laoar.shao@xxxxxxxxx> > > Cc: Minchan Kim <minchan@xxxxxxxxxx> > > Cc: Johannes Weiner <hannes@xxxxxxxxxxx> > > Cc: Michal Hocko <mhocko@xxxxxxxx> > > --- > > mm/memcontrol.c | 11 +++++++++-- > > mm/rmap.c | 26 ++++++++++++++++++-------- > > mm/swap.c | 2 ++ > > mm/vmscan.c | 2 ++ > > 4 files changed, 31 insertions(+), 10 deletions(-) > > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > index 3dcbf24d2227..217a6f10fa8d 100644 > > --- a/mm/memcontrol.c > > +++ b/mm/memcontrol.c > > @@ -5659,8 +5659,15 @@ static int mem_cgroup_move_account(struct page *page, > > > > if (PageAnon(page)) { > > if (page_mapped(page)) { > > - __mod_lruvec_state(from_vec, NR_ANON_MAPPED, -nr_pages); > > - __mod_lruvec_state(to_vec, NR_ANON_MAPPED, nr_pages); > > + if (!PageSwapBacked(page) && !PageSwapCache(page) && > > + !PageUnevictable(page)) { > > + __mod_lruvec_state(from_vec, NR_FILE_PAGES, -nr_pages); > > + __mod_lruvec_state(to_vec, NR_FILE_PAGES, nr_pages); > > + } else { > > + __mod_lruvec_state(from_vec, NR_ANON_MAPPED, -nr_pages); > > + __mod_lruvec_state(to_vec, NR_ANON_MAPPED, nr_pages); > > + } > > + > > if (PageTransHuge(page)) { > > __mod_lruvec_state(from_vec, NR_ANON_THPS, > > -nr_pages); > > diff --git a/mm/rmap.c b/mm/rmap.c > > index 1b84945d655c..690ca7ff2392 100644 > > --- a/mm/rmap.c > > +++ b/mm/rmap.c > > @@ -1312,8 +1312,13 @@ static void page_remove_anon_compound_rmap(struct page *page) > > if (unlikely(PageMlocked(page))) > > clear_page_mlock(page); > > > > - if (nr) > > - __mod_lruvec_page_state(page, NR_ANON_MAPPED, -nr); > > + if (nr) { > > + if (PageLRU(page) && PageAnon(page) && !PageSwapBacked(page) && > > + !PageSwapCache(page) && !PageUnevictable(page)) > > + __mod_lruvec_page_state(page, NR_FILE_PAGES, -nr); > > + else > > + __mod_lruvec_page_state(page, NR_ANON_MAPPED, -nr); > > + } > > } > > > > /** > > @@ -1341,12 +1346,17 @@ void page_remove_rmap(struct page *page, bool compound) > > if (!atomic_add_negative(-1, &page->_mapcount)) > > goto out; > > > > - /* > > - * We use the irq-unsafe __{inc|mod}_zone_page_stat because > > - * these counters are not modified in interrupt context, and > > - * pte lock(a spinlock) is held, which implies preemption disabled. > > - */ > > - __dec_lruvec_page_state(page, NR_ANON_MAPPED); > > + if (PageLRU(page) && PageAnon(page) && !PageSwapBacked(page) && > > + !PageSwapCache(page) && !PageUnevictable(page)) { > > + __dec_lruvec_page_state(page, NR_FILE_PAGES); > > + } else { > > + /* > > + * We use the irq-unsafe __{inc|mod}_zone_page_stat because > > + * these counters are not modified in interrupt context, and > > + * pte lock(a spinlock) is held, which implies preemption disabled. > > + */ > > + __dec_lruvec_page_state(page, NR_ANON_MAPPED); > > + } > > > > if (unlikely(PageMlocked(page))) > > clear_page_mlock(page); > > diff --git a/mm/swap.c b/mm/swap.c > > index 47a47681c86b..340c5276a0f3 100644 > > --- a/mm/swap.c > > +++ b/mm/swap.c > > @@ -601,6 +601,7 @@ static void lru_lazyfree_fn(struct page *page, struct lruvec *lruvec, > > > > del_page_from_lru_list(page, lruvec, > > LRU_INACTIVE_ANON + active); > > + __mod_lruvec_state(lruvec, NR_ANON_MAPPED, -nr_pages); > > ClearPageActive(page); > > ClearPageReferenced(page); > > /* > > @@ -610,6 +611,7 @@ static void lru_lazyfree_fn(struct page *page, struct lruvec *lruvec, > > */ > > ClearPageSwapBacked(page); > > add_page_to_lru_list(page, lruvec, LRU_INACTIVE_FILE); > > + __mod_lruvec_state(lruvec, NR_FILE_PAGES, nr_pages); > > > > __count_vm_events(PGLAZYFREE, nr_pages); > > __count_memcg_events(lruvec_memcg(lruvec), PGLAZYFREE, > > diff --git a/mm/vmscan.c b/mm/vmscan.c > > index 1b8f0e059767..4821124c70f7 100644 > > --- a/mm/vmscan.c > > +++ b/mm/vmscan.c > > @@ -1428,6 +1428,8 @@ static unsigned int shrink_page_list(struct list_head *page_list, > > goto keep_locked; > > } > > > > + mod_lruvec_page_state(page, NR_ANON_MAPPED, nr_pages); > > + mod_lruvec_page_state(page, NR_FILE_PAGES, -nr_pages); > > count_vm_event(PGLAZYFREED); > > count_memcg_page_event(page, PGLAZYFREED); > > } else if (!mapping || !__remove_mapping(mapping, page, true, > > -- > > 2.18.4 > > > > -- > Michal Hocko > SUSE Labs -- Thanks Yafang