On 24 July 2018 at 12:05, Bruce Merry <bmerry@xxxxxxxxx> wrote: > To reproduce: > 1. Start cadvisor running. I use the 0.30.2 binary from Github, and > run it with sudo ./cadvisor-0.30.2 --logtostderr=true > 2. Run the Python 3 script below, which repeatedly creates a cgroup, > enters it, stats some files in it, and leaves it again (and removes > it). It takes a few minutes to run. > 3. time cat /sys/fs/cgroup/memory/memory.stat. It now takes about 20ms for me. > 4. sudo sysctl vm.drop_caches=2 > 5. time cat /sys/fs/cgroup/memory/memory.stat. It is back to 1-2ms. > > I've also added some code to memcg_stat_show to report the number of > cgroups in the hierarchy (iterations in for_each_mem_cgroup_tree). > Running the script increases it from ~700 to ~41000. The script > iterates 250,000 times, so only some fraction of the cgroups become > zombies. I've discovered that I'd messed up that instrumentation code (it was incrementing inside a loop so counted 5x too many cgroups), so some of the things I said turn out to be wrong. Let me try again: - Running the script generates about 8000 zombies (not 40000), with or without Shakeel's patch (for 250,000 cgroups created/destroyed - so possibly there is some timing condition that makes them into zombies. I've only measured it with 4.17, but based on timing results I have no particular reason to think it's wildly different to older kernels. - After running the script 5 times (to generate 40K zombies), getting the stats takes 20ms with Shakeel's patch and 80ms without it (on 4.17.9) - which is a speedup of the same order of magnitude as Shakeel observed with non-zombies. - 4.17.9 already seems to be an improvement over 4.15: with 40K (non-zombie) cgroups, memory.stat time decreases from 200ms to 75ms. So with 4.15 -> 4.17.9 plus Shakeel's patch, the effects are reduced by an order of magnitude, which is good news. Of course, that doesn't solve the fundamental issue of why the zombies get generated in the first place. I'm not a kernel developer and I very much doubt I'll have the time to try to debug what may turn out to be a race condition, but let me know if I can help with testing things. Regards Bruce -- Bruce Merry Senior Science Processing Developer SKA South Africa