On Wed 20-05-15 19:53:02, Oleg Nesterov wrote: > On 05/20, Michal Hocko wrote: > > > > So I assume the leader simply waits for its threads to finish and it > > stays in the sibling list. __unhash_process seems like it does the final > > cleanup and unlinks the leader from the lists. Which means that > > mm_update_next_owner never sees !group_leader. Is that correct Oleg? > > Yes, yes, the group leader can't go away until the whole thread-group dies. OK, then we should have a guarantee that mm->owner is always thread group leader, right? > But can't we kill mm->owner somehow? I would be happy about that. But it is not that simple. > I mean, turn it into something else, > ideally into "struct mem_cgroup *" although I doubt this is possible. Sounds like a good idea but... it duplicates the cgroup tracking into two places and that asks for troubles. On the other hand we are doing that already because mm->owner might be in a different cgroup than the current. However, this is an inherent problem because CLONE_VM doesn't imply CLONE_THREAD. So in the end it doesn't look much worse IMO. We will loose the "this task is in charge" aspect and that would be a user space visible change but I am not sure how much it is a problem. Maybe somebody is (ab)using this to workaround the restriction that all threads are in the same cgroup. >From the implementation POV it even looks easier because we just have to hook to fork (pin the memcg on dup_mm), to attach to change the memcg and to mmput to unpin the memcg. I will think about that some more. > It would be nice to kill mm_update_next_owner()/etc, this looks really > ugly. We only need it for mem_cgroup_from_task(), and it would be much > more clean to have mem_cgroup_from_mm(struct mm_struct *mm), imho. > > Oleg. > -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe cgroups" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html