>> >> 2. memcg's __DEPRECATED_clear_css_refs >> >> This is a remnant of another weird design decision of requiring >> synchronous draining of refcnts on cgroup removal and allowing >> subsystems to veto cgroup removal - what's the userspace supposed to >> do afterwards? Note that this also hinders co-mounting different >> controllers. >> >> The behavior could be useful for development and debugging but it >> unnecessarily interlocks userland visible behavior with in-kernel >> implementation details. To me, it seems outright wrong (either >> implement proper severing semantics in the controller or do full >> refcnting) and disallows, for example, lazy drain of caching refs. >> Also, it complicates the removal path with try / commit / revert >> logic which has never been fully correct since the beginning. >> >> Currently, the only left user is memcg. >> >> Solution: >> >> * Update memcg->pre_destroy() such that it never fails. >> >> * Drop __DEPRECATED_clear_css_refs and all related logic. >> Convert pre_destroy() to return void. >> >> Who: >> >> KAMEZAWA, Michal, PLEASE. I will make __DEPRECATED_clear_css_refs >> trigger WARN sooner or later. Let's please get this settled. >> >> 3. cgroup_mutex usage outside cgroup core >> >> This is another thing which is simply broken. Given the way cgroup >> is structured and used, nesting cgroup_mutex inside any other >> commonly used lock simply doesn't work - it's held while invoking >> controller callbacks which then interact and synchronize with >> various core subsystems. >> >> There are currently three external cgroup_mutex users - cpuset, >> memcontrol and cgroup_freezer. >> >> Solution: >> >> Well, we should just stop doing it - use a separate nested lock >> (which seems possible for cgroup_freezer) or track and mange task >> in/egress some other way. >> >> Who: >> >> I'll do the cgroup_freezer. I'm hoping PeterZ or someone who's >> familiar with the code base takes care of cpuset. Michal, can you >> please take care of memcg? >> > > I think this is a pressing problem, yes, but not the only problem with > cgroup lock. Even if we restrict its usage to cgroup core, we still can > call cgroup functions, which will lock. And then we gain nothing. > Agreed. The biggest issue in cpuset is if hotplug makes a cpuset's cpulist empty the tasks in it will be moved to an ancestor cgroup, which requires holding cgroup lock. We have to either change cpuset's behavior or eliminate the global lock. > And the problem is that people need to lock. cgroup_lock is needed > because the data you are accessing is protected by it. The way I see it, > it is incredible how we were able to revive the BKL in the form of > cgroup_lock after we finally manage to successfully get rid of it! > > We should just start to do a more fine grained locking of data, instead > of "stop the world, cgroup just started!". If we do that, the problem > you are trying to address here will even cease to exist. > _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/containers