On Tue, Jun 11, 2019 at 4:18 PM Roman Gushchin <guro@xxxxxx> wrote: > > Currently each charged slab page holds a reference to the cgroup to > which it's charged. Kmem_caches are held by the memcg and are released > all together with the memory cgroup. It means that none of kmem_caches > are released unless at least one reference to the memcg exists, which > is very far from optimal. > > Let's rework it in a way that allows releasing individual kmem_caches > as soon as the cgroup is offline, the kmem_cache is empty and there > are no pending allocations. > > To make it possible, let's introduce a new percpu refcounter for > non-root kmem caches. The counter is initialized to the percpu mode, > and is switched to the atomic mode during kmem_cache deactivation. The > counter is bumped for every charged page and also for every running > allocation. So the kmem_cache can't be released unless all allocations > complete. > > To shutdown non-active empty kmem_caches, let's reuse the work queue, > previously used for the kmem_cache deactivation. Once the reference > counter reaches 0, let's schedule an asynchronous kmem_cache release. > > * I used the following simple approach to test the performance > (stolen from another patchset by T. Harding): > > time find / -name fname-no-exist > echo 2 > /proc/sys/vm/drop_caches > repeat 10 times > > Results: > > orig patched > > real 0m1.455s real 0m1.355s > user 0m0.206s user 0m0.219s > sys 0m0.855s sys 0m0.807s > > real 0m1.487s real 0m1.699s > user 0m0.221s user 0m0.256s > sys 0m0.806s sys 0m0.948s > > real 0m1.515s real 0m1.505s > user 0m0.183s user 0m0.215s > sys 0m0.876s sys 0m0.858s > > real 0m1.291s real 0m1.380s > user 0m0.193s user 0m0.198s > sys 0m0.843s sys 0m0.786s > > real 0m1.364s real 0m1.374s > user 0m0.180s user 0m0.182s > sys 0m0.868s sys 0m0.806s > > real 0m1.352s real 0m1.312s > user 0m0.201s user 0m0.212s > sys 0m0.820s sys 0m0.761s > > real 0m1.302s real 0m1.349s > user 0m0.205s user 0m0.203s > sys 0m0.803s sys 0m0.792s > > real 0m1.334s real 0m1.301s > user 0m0.194s user 0m0.201s > sys 0m0.806s sys 0m0.779s > > real 0m1.426s real 0m1.434s > user 0m0.216s user 0m0.181s > sys 0m0.824s sys 0m0.864s > > real 0m1.350s real 0m1.295s > user 0m0.200s user 0m0.190s > sys 0m0.842s sys 0m0.811s > > So it looks like the difference is not noticeable in this test. > > Signed-off-by: Roman Gushchin <guro@xxxxxx> > Acked-by: Vladimir Davydov <vdavydov.dev@xxxxxxxxx> Reviewed-by: Shakeel Butt <shakeelb@xxxxxxxxxx>