On Thu 05-09-19 15:10:34, Honglei Wang wrote: > lruvec_lru_size() is involving lruvec_page_state_local() to get the > lru_size in the current code. It's base on lruvec_stat_local.count[] > of mem_cgroup_per_node. This counter is updated in batch. It won't > do charge if the number of coming pages doesn't meet the needs of > MEMCG_CHARGE_BATCH who's defined as 32 now. > > The testcase in LTP madvise09[1] fails due to small block memory is > not charged. It creates a new memcgroup and sets up 32 MADV_FREE > pages. Then it forks child who will introduce memory pressure in the > memcgroup. The MADV_FREE pages are expected to be released under the > pressure, but 32 is not more than MEMCG_CHARGE_BATCH and these pages > won't be charged in lruvec_stat_local.count[] until some more pages > come in to satisfy the needs of batch charging. So these MADV_FREE > pages can't be freed in memory pressure which is a bit conflicted > with the definition of MADV_FREE. The test case is simly wrong. The caching and the batch size is an internal implementation detail. Moreover MADV_FREE is a _hint_ so all you can say is that those pages will get freed at some point in time but you cannot make any assumptions about when that moment happens. > Getting lru_size base on lru_zone_size of mem_cgroup_per_node which > is not updated in batch can make it a bit more accurate in similar > scenario. What does that mean? It would be more helpful to describe the code path which will use this more precise value and what is the effect of that. As I've said in the previous version, I do not object to the patch because a more precise lruvec_lru_size sounds like a nice thing as long as we are not paying a high price for that. Just look at the global case for mem_cgroup_disabled(). It uses node_page_state and that one is using per-cpu accounting with regular global value refreshing IIRC. > [1] https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/syscalls/madvise/madvise09.c > > Signed-off-by: Honglei Wang <honglei.wang@xxxxxxxxxx> > --- > mm/vmscan.c | 9 +++++---- > 1 file changed, 5 insertions(+), 4 deletions(-) > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index c77d1e3761a7..c28672460868 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -354,12 +354,13 @@ unsigned long zone_reclaimable_pages(struct zone *zone) > */ > unsigned long lruvec_lru_size(struct lruvec *lruvec, enum lru_list lru, int zone_idx) > { > - unsigned long lru_size; > + unsigned long lru_size = 0; > int zid; > > - if (!mem_cgroup_disabled()) > - lru_size = lruvec_page_state_local(lruvec, NR_LRU_BASE + lru); > - else > + if (!mem_cgroup_disabled()) { > + for (zid = 0; zid < MAX_NR_ZONES; zid++) > + lru_size += mem_cgroup_get_zone_lru_size(lruvec, lru, zid); > + } else > lru_size = node_page_state(lruvec_pgdat(lruvec), NR_LRU_BASE + lru); > > for (zid = zone_idx + 1; zid < MAX_NR_ZONES; zid++) { > -- > 2.17.0 -- Michal Hocko SUSE Labs