lruvec_lru_size() is involving lruvec_page_state_local() to get the lru_size in the current code. It's base on lruvec_stat_local.count[] of mem_cgroup_per_node. This counter is updated in batch. It won't do charge if the number of coming pages doesn't meet the needs of MEMCG_CHARGE_BATCH who's defined as 32 now. The testcase in LTP madvise09[1] fails due to small block memory is not charged. It creates a new memcgroup and sets up 32 MADV_FREE pages. Then it forks child who will introduce memory pressure in the memcgroup. The MADV_FREE pages are expected to be released under the pressure, but 32 is not more than MEMCG_CHARGE_BATCH and these pages won't be charged in lruvec_stat_local.count[] until some more pages come in to satisfy the needs of batch charging. So these MADV_FREE pages can't be freed in memory pressure which is a bit conflicted with the definition of MADV_FREE. Getting lru_size base on lru_zone_size of mem_cgroup_per_node which is not updated in batch can make it a bit more accurate in similar scenario. [1] https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/syscalls/madvise/madvise09.c Signed-off-by: Honglei Wang <honglei.wang@xxxxxxxxxx> --- mm/vmscan.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index c77d1e3761a7..c28672460868 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -354,12 +354,13 @@ unsigned long zone_reclaimable_pages(struct zone *zone) */ unsigned long lruvec_lru_size(struct lruvec *lruvec, enum lru_list lru, int zone_idx) { - unsigned long lru_size; + unsigned long lru_size = 0; int zid; - if (!mem_cgroup_disabled()) - lru_size = lruvec_page_state_local(lruvec, NR_LRU_BASE + lru); - else + if (!mem_cgroup_disabled()) { + for (zid = 0; zid < MAX_NR_ZONES; zid++) + lru_size += mem_cgroup_get_zone_lru_size(lruvec, lru, zid); + } else lru_size = node_page_state(lruvec_pgdat(lruvec), NR_LRU_BASE + lru); for (zid = zone_idx + 1; zid < MAX_NR_ZONES; zid++) { -- 2.17.0