On Mon 17-06-13 09:51:29, Tejun Heo wrote: > Hello, Michal. > > On Mon, Jun 17, 2013 at 03:51:22PM +0200, Michal Hocko wrote: > > > Some configurations which are legitimate under the current parent > > > might be invalid when put under a different parent. > > > > yes, for example all configurations where old parent is more restrictive > > than the new one. For example. hardlimit in memcg or even more > > I'm not following the hardlimit part. Shouldn't a given hardlimit > value have the same meaning regardless of where the node is located? > Its application will surely be different but its meaning would be the > same, no? Yes the hardlimit example was misleading. I was thinking about all the consequences. Sorry about the confusion. > > oom_control, swappiness or use_hierarchy which are expected to be > > consistent down the hierarchy. > > use_hierarchy is going away. Can you please explain how oom_control > and swappiness behave? Both have to be consistent throughout the hierarchy currently. The same swappiness is used for hierarchical reclaim at any level. I plan to remove this restriction because this is really too restrictive. We should be able to say that some groups in a hierarchy shouldn't swap etc. I am not sure about oom_control yet, though. OOM is handled at the first hierarchy level which cannot be pushed down to its hard limit and the whole subtree is oom frozen. It would be little awkward if a group down the hierarchy hit its own limit as well and fired OOM while the OOM is frozen up the hierarchy and user space tries to find out the best candidate to kill in that hierarchy: A (under_oom=true) - waiting for userspace \ . . . \ B - hit the limit and call mem_cgroup_out_of_memory On the other hand the handler should be prepared to handle exiting tasks (which might change the oom situation) so the under_oom has to be checked after a victim is selected and before it is killed to prevent from an excessive killing. It would be even more interesting if the situation was opposite. Child has the oom disabled and waits for userspace to handle the situation. A (oom_control=0) / \___________ / \ B (charge) C (oom_control=1) Parent gets under oom as well because a sibling pushes it to the hard limit. Parent wouldn't be able to oom freeze the hierarchy (mem_cgroup_oom_lock) because there is an ongoing oom down the hierarchy so it would be basically waiting for userspace as well. This would be quite unexpected. While the oom freezing could be tweaked to handle this it sounds quite messy to me. > > The biggest problem I can see is how the core cgroup code know when it is > > OK to migrate. There might be some ongoing operations that depend on the > > current tree structure. For example the hierarchical reclaim or oom > > etc.. > > Internally, I think it should be implemented as task migrating to > another cgroup - IOW, to controllers, it'll appear the same as the > userland echoing the pid to cgroup.procs file on the new cgroup. OK, that should work. The controller still should have an option to say that the current configuration is not compatible with its new parent. E.g. for memcg if parent->oom_control != moving_memcg->oom_control so the move would fail. > That's the only sane way to implement it as controllers need to do > everything which it does for the normal task migration case anyway. > Matching the impedance would be the responsibility of cgroup core. > > > I do not think that soft reclaim which I was talking about at LSF would > > change anything here as it would be pretty much the same as the hard > > limit. But that is not so important. > > I think it's very important to have trivially stackable > configurations. Maybe we can make more complex semantics work too but > it's gonna be confusing like hell when combined with the level of > automation we're hoping to achieve. > > Thanks. > > -- > tejun -- Michal Hocko SUSE Labs _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/containers