On Wed 17-02-21 12:41:36, Tim Chen wrote: > On a per node basis, the mem cgroup soft limit tree on each node tracks > how much a cgroup has exceeded its soft limit memory limit and sorts > the cgroup by its excess usage. On page release, the trees are not > updated right away, until we have gathered a batch of pages belonging to > the same cgroup. This reduces the frequency of updating the soft limit tree > and locking of the tree and associated cgroup. > > However, the batch of pages could contain pages from multiple nodes but > only the soft limit tree from one node would get updated. Change the > logic so that we update the tree in batch of pages, with each batch of > pages all in the same mem cgroup and memory node. An update is issued for > the batch of pages of a node collected till now whenever we encounter > a page belonging to a different node. Note that this batching for > the same node logic is only relevant for v1 cgroup that has a memory > soft limit. Let me paste the discussion related to this patch from other reply: > >> For patch 3 regarding the uncharge_batch, it > >> is more of an observation that we should uncharge in batch of same node > >> and not prompted by actual workload. > >> Thinking more about this, the worst that could happen > >> is we could have some entries in the soft limit tree that overestimate > >> the memory used. The worst that could happen is a soft page reclaim > >> on that cgroup. The overhead from extra memcg event update could > >> be more than a soft page reclaim pass. So let's drop patch 3 > >> for now. > > > > I would still prefer to handle that in the soft limit reclaim path and > > check each memcg for the soft limit reclaim excess before the reclaim. > > > > Something like this? > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 8bddee75f5cb..b50cae3b2a1a 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -3472,6 +3472,14 @@ unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order, > if (!mz) > break; > > + /* > + * Soft limit tree is updated based on memcg events sampling. > + * We could have missed some updates on page uncharge and > + * the cgroup is below soft limit. Skip useless soft reclaim. > + */ > + if (!soft_limit_excess(mz->memcg)) > + continue; > + > nr_scanned = 0; > reclaimed = mem_cgroup_soft_reclaim(mz->memcg, pgdat, Yes I meant something like this but then I have looked more closely and this shouldn't be needed afterall. __mem_cgroup_largest_soft_limit_node already does all the work if (!soft_limit_excess(mz->memcg) || !css_tryget(&mz->memcg->css)) goto retry; so this shouldn't really happen. -- Michal Hocko SUSE Labs