On Mon, Feb 08, 2016 at 01:23:53AM -0500, Johannes Weiner wrote: > On Sun, Feb 07, 2016 at 08:27:35PM +0300, Vladimir Davydov wrote: > > Workingset code was recently made memcg aware, but shadow node shrinker > > is still global. As a result, one small cgroup can consume all memory > > available for shadow nodes, possibly hurting other cgroups by reclaiming > > their shadow nodes, even though reclaim distances stored in its shadow > > nodes have no effect. To avoid this, we need to make shadow node > > shrinker memcg aware. > > > > Signed-off-by: Vladimir Davydov <vdavydov@xxxxxxxxxxxxx> > > This patch is straight forward, but there is one tiny thing that bugs > me about it, and that is switching from available memory to the size > of the active list. Because the active list can shrink drastically at > runtime. Yeah, active file lru is a volatile thing indeed. Not only can it shrink rapidly, it can also grow in an instant (e.g. due to mark_page_accessed) so you're right - sizing shadow node lru basing solely on the active lru size would be too unpredictable. > > It's true that both the shrinking of the active list and subsequent > activations to regrow it will reduce the number of actionable > refaults, and so it wouldn't be unreasonable to also shrink shadow > nodes when the active list shrinks. > > However, I think these are too many assumptions to encode in the > shrinker, because it is only meant to prevent a worst-case explosion > of radix tree nodes. I'd prefer it to be dumb and conservative. > > Could we instead go with the current usage of the memcg? Whether > reclaim happens globally or due to the memory limit, the usage at the > time of reclaim gives a good idea of the memory is available to the > group. But it's making less assumptions about the internal composition > of the memcg's memory, and the consequences associated with that. But that would likely result in wasting a considerable chunk of memory for stale shadow nodes in case file caches constitute only a small part of memcg memory consumption, which isn't good IMHO. May be, we'd better use LRU_ALL_FILE / 2 instead? diff --git a/mm/workingset.c b/mm/workingset.c index 8c07cd8af15e..8a75f8d2916a 100644 --- a/mm/workingset.c +++ b/mm/workingset.c @@ -351,9 +351,10 @@ static unsigned long count_shadow_nodes(struct shrinker *shrinker, if (memcg_kmem_enabled()) pages = mem_cgroup_node_nr_lru_pages(sc->memcg, sc->nid, - BIT(LRU_ACTIVE_FILE)); + LRU_ALL_FILE); else - pages = node_page_state(sc->nid, NR_ACTIVE_FILE); + pages = node_page_state(sc->nid, NR_ACTIVE_FILE) + + node_page_state(sc->nid, NR_INACTIVE_FILE); /* * Active cache pages are limited to 50% of memory, and shadow @@ -369,7 +370,7 @@ static unsigned long count_shadow_nodes(struct shrinker *shrinker, * * PAGE_SIZE / radix_tree_nodes / node_entries / PAGE_SIZE */ - max_nodes = pages >> (RADIX_TREE_MAP_SHIFT - 3); + max_nodes = pages >> (1 + RADIX_TREE_MAP_SHIFT - 3); if (shadow_nodes <= max_nodes) return 0; -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>