Hi Chris, On 2019/6/16 PM 6:37, Chris Down wrote: > Hi Xunlei, > > Xunlei Pang writes: >> docker and various types(different memory capacity) of containers >> are managed by k8s, it's a burden for k8s to maintain those dynamic >> figures, simply set "max" to key containers is always welcome. > > Right, setting "max" is generally a fine way of going about it. > >> Set "max" to docker also protects docker cgroup memory(as docker >> itself has tasks) unnecessarily. > > That's not correct -- leaf memcgs have to _explicitly_ request memory > protection. From the documentation: > > memory.low > > [...] > > Best-effort memory protection. If the memory usages of a > cgroup and all its ancestors are below their low boundaries, > the cgroup's memory won't be reclaimed unless memory can be > reclaimed from unprotected cgroups. > > Note the part that the cgroup itself also must be within its low > boundary, which is not implied simply by having ancestors that would > permit propagation of protections. > > In this case, Docker just shouldn't request it for those Docker-related > tasks, and they won't get any. That seems a lot simpler and more > intuitive than special casing "0" in ancestors. > >> This patch doesn't take effect on any intermediate layer with >> positive memory.min set, it requires all the ancestors having >> 0 memory.min to work. >> >> Nothing special change, but more flexible to business deployment... > > Not so, this change is extremely "special". It violates the basic > expectation that 0 means no possibility of propagation of protection, > and I still don't see a compelling argument why Docker can't just set > "max" in the intermediate cgroup and not accept any protection in leaf > memcgs that it doesn't want protection for. I got the reason, I'm using cgroup v1(with memory.min backport) which permits tasks existent in "docker" cgroup.procs. For cgroup v2, it's not a problem. Thanks, Xunlei