Re: [PATCH] mm: account lazily freed anon pages in NR_FILE_PAGES

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu 05-11-20 21:10:12, Yafang Shao wrote:
> The memory utilization (Used / Total) is used to monitor the memory
> pressure by us. If it is too high, it means the system may be OOM sooner
> or later when swap is off, then we will make adjustment on this system.
> 
> However, this method is broken since MADV_FREE is introduced, because
> these lazily free anonymous can be reclaimed under memory pressure while
> they are still accounted in NR_ANON_MAPPED.
> 
> Furthermore, since commit f7ad2a6cb9f7 ("mm: move MADV_FREE pages into
> LRU_INACTIVE_FILE list"), these lazily free anonymous pages are moved
> from anon lru list into file lru list. That means
> (Inactive(file) + Active(file)) may be much larger than Cached in
> /proc/meminfo. That makes our users confused.
> 
> So we'd better account the lazily freed anonoymous pages in
> NR_FILE_PAGES as well.

Can you simply subtract lazyfree pages in the userspace? I am afraid your
patch just makes the situation even more muddy. NR_ANON_MAPPED is really
meant to tell how many anonymous pages are mapped. And MADV_FREE pages
are mapped until they are freed. NR_*_FILE are reflecting size of LRU
lists and NR_FILE_PAGES reflects the number of page cache pages but
madvfree pages are not a page cache. They are aged together with file
pages but they are not the same thing. Same like shmem pages are page
cache that is living on anon LRUs.

Confusing? Tricky? Yes, likely. But I do not think we want to bend those
counters even further.

> Signed-off-by: Yafang Shao <laoar.shao@xxxxxxxxx>
> Cc: Minchan Kim <minchan@xxxxxxxxxx>
> Cc: Johannes Weiner <hannes@xxxxxxxxxxx>
> Cc: Michal Hocko <mhocko@xxxxxxxx>
> ---
>  mm/memcontrol.c | 11 +++++++++--
>  mm/rmap.c       | 26 ++++++++++++++++++--------
>  mm/swap.c       |  2 ++
>  mm/vmscan.c     |  2 ++
>  4 files changed, 31 insertions(+), 10 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 3dcbf24d2227..217a6f10fa8d 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -5659,8 +5659,15 @@ static int mem_cgroup_move_account(struct page *page,
>  
>  	if (PageAnon(page)) {
>  		if (page_mapped(page)) {
> -			__mod_lruvec_state(from_vec, NR_ANON_MAPPED, -nr_pages);
> -			__mod_lruvec_state(to_vec, NR_ANON_MAPPED, nr_pages);
> +			if (!PageSwapBacked(page) && !PageSwapCache(page) &&
> +			    !PageUnevictable(page)) {
> +				__mod_lruvec_state(from_vec, NR_FILE_PAGES, -nr_pages);
> +				__mod_lruvec_state(to_vec, NR_FILE_PAGES, nr_pages);
> +			} else {
> +				__mod_lruvec_state(from_vec, NR_ANON_MAPPED, -nr_pages);
> +				__mod_lruvec_state(to_vec, NR_ANON_MAPPED, nr_pages);
> +			}
> +
>  			if (PageTransHuge(page)) {
>  				__mod_lruvec_state(from_vec, NR_ANON_THPS,
>  						   -nr_pages);
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 1b84945d655c..690ca7ff2392 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1312,8 +1312,13 @@ static void page_remove_anon_compound_rmap(struct page *page)
>  	if (unlikely(PageMlocked(page)))
>  		clear_page_mlock(page);
>  
> -	if (nr)
> -		__mod_lruvec_page_state(page, NR_ANON_MAPPED, -nr);
> +	if (nr) {
> +		if (PageLRU(page) && PageAnon(page) && !PageSwapBacked(page) &&
> +		    !PageSwapCache(page) && !PageUnevictable(page))
> +			__mod_lruvec_page_state(page, NR_FILE_PAGES, -nr);
> +		else
> +			__mod_lruvec_page_state(page, NR_ANON_MAPPED, -nr);
> +	}
>  }
>  
>  /**
> @@ -1341,12 +1346,17 @@ void page_remove_rmap(struct page *page, bool compound)
>  	if (!atomic_add_negative(-1, &page->_mapcount))
>  		goto out;
>  
> -	/*
> -	 * We use the irq-unsafe __{inc|mod}_zone_page_stat because
> -	 * these counters are not modified in interrupt context, and
> -	 * pte lock(a spinlock) is held, which implies preemption disabled.
> -	 */
> -	__dec_lruvec_page_state(page, NR_ANON_MAPPED);
> +	if (PageLRU(page) && PageAnon(page) && !PageSwapBacked(page) &&
> +	    !PageSwapCache(page) && !PageUnevictable(page)) {
> +		__dec_lruvec_page_state(page, NR_FILE_PAGES);
> +	} else {
> +		/*
> +		 * We use the irq-unsafe __{inc|mod}_zone_page_stat because
> +		 * these counters are not modified in interrupt context, and
> +		 * pte lock(a spinlock) is held, which implies preemption disabled.
> +		 */
> +		__dec_lruvec_page_state(page, NR_ANON_MAPPED);
> +	}
>  
>  	if (unlikely(PageMlocked(page)))
>  		clear_page_mlock(page);
> diff --git a/mm/swap.c b/mm/swap.c
> index 47a47681c86b..340c5276a0f3 100644
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -601,6 +601,7 @@ static void lru_lazyfree_fn(struct page *page, struct lruvec *lruvec,
>  
>  		del_page_from_lru_list(page, lruvec,
>  				       LRU_INACTIVE_ANON + active);
> +		__mod_lruvec_state(lruvec, NR_ANON_MAPPED, -nr_pages);
>  		ClearPageActive(page);
>  		ClearPageReferenced(page);
>  		/*
> @@ -610,6 +611,7 @@ static void lru_lazyfree_fn(struct page *page, struct lruvec *lruvec,
>  		 */
>  		ClearPageSwapBacked(page);
>  		add_page_to_lru_list(page, lruvec, LRU_INACTIVE_FILE);
> +		__mod_lruvec_state(lruvec, NR_FILE_PAGES, nr_pages);
>  
>  		__count_vm_events(PGLAZYFREE, nr_pages);
>  		__count_memcg_events(lruvec_memcg(lruvec), PGLAZYFREE,
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 1b8f0e059767..4821124c70f7 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1428,6 +1428,8 @@ static unsigned int shrink_page_list(struct list_head *page_list,
>  				goto keep_locked;
>  			}
>  
> +			mod_lruvec_page_state(page, NR_ANON_MAPPED, nr_pages);
> +			mod_lruvec_page_state(page, NR_FILE_PAGES, -nr_pages);
>  			count_vm_event(PGLAZYFREED);
>  			count_memcg_page_event(page, PGLAZYFREED);
>  		} else if (!mapping || !__remove_mapping(mapping, page, true,
> -- 
> 2.18.4
> 

-- 
Michal Hocko
SUSE Labs




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux