On 27.03.2018 13:00, Vladimir Davydov wrote: > On Mon, Mar 26, 2018 at 06:29:05PM +0300, Kirill Tkhai wrote: >>>> @@ -182,6 +187,9 @@ struct mem_cgroup { >>>> unsigned long low; >>>> unsigned long high; >>>> >>>> + /* Bitmap of shrinker ids suitable to call for this memcg */ >>>> + struct shrinkers_map __rcu *shrinkers_map; >>>> + >>> >>> We keep all per-node data in mem_cgroup_per_node struct. I think this >>> bitmap should be defined there as well. >> >> But them we'll have to have struct rcu_head for every node to free the map >> via rcu. This is the only reason I did that. But if you think it's not a problem, >> I'll agree with you. > > I think it's OK. It'd be consistent with how list_lru handles > list_lru_memcg reallocations. > >>>> @@ -4487,6 +4490,8 @@ static void mem_cgroup_css_offline(struct cgroup_subsys_state *css) >>>> struct mem_cgroup *memcg = mem_cgroup_from_css(css); >>>> struct mem_cgroup_event *event, *tmp; >>>> >>>> + free_shrinker_maps(memcg); >>>> + >>> >>> AFAIU this can race with shrink_slab accessing the map, resulting in >>> use-after-free. IMO it would be safer to free the bitmap from css_free. >> >> But doesn't shrink_slab() iterate only online memcg? > > Well, yes, shrink_slab() bails out if the memcg is offline, but I > suspect there might be a race condition between shrink_slab and > css_offline when shrink_slab calls shrinkers for an offline cgroup. > >> >>>> /* >>>> * Unregister events and notify userspace. >>>> * Notify userspace about cgroup removing only after rmdir of cgroup >>>> diff --git a/mm/vmscan.c b/mm/vmscan.c >>>> index 97ce4f342fab..9d1df5d90eca 100644 >>>> --- a/mm/vmscan.c >>>> +++ b/mm/vmscan.c >>>> @@ -165,6 +165,10 @@ static DECLARE_RWSEM(bitmap_rwsem); >>>> static int bitmap_id_start; >>>> static int bitmap_nr_ids; >>>> static struct shrinker **mcg_shrinkers; >>>> +struct shrinkers_map *__rcu root_shrinkers_map; >>> >>> Why do you need root_shrinkers_map? AFAIR the root memory cgroup doesn't >>> have kernel memory accounting enabled. >> But we can charge the corresponding lru and iterate it over global reclaim, >> don't we? > > Yes, I guess you're right. But do we need to care about it? Would it be > OK if we iterated over all shrinkers for the root cgroup? Dunno... In case of 2000 shrinkers, this will flush the cache. This is the reason :) > Anyway, please try to handle the root cgroup consistently with other > cgroups. I mean, nothing like this root_shrinkers_map should exist. > It should be either a part of root_mem_cgroup or we should iterate over > all shrinkers for the root cgroup. It's not possible. root_mem_cgroup does not exist always. Even if CONFIG_MEMCG is enabled, memcg may be prohibited by boot params. In case of it's not prohibited, there are some shrinkers, which are registered before it's initialized, while memory_cgrp_subsys can't has .early_init = 1. >> >> struct list_lru_node { >> ... >> /* global list, used for the root cgroup in cgroup aware lrus */ >> struct list_lru_one lru; >> ... >> }; Kirill