On Thu 21-05-15 13:22:17, Johannes Weiner wrote: > On Wed, May 20, 2015 at 10:22:21PM +0200, Michal Hocko wrote: > > On Wed 20-05-15 19:53:02, Oleg Nesterov wrote: > > > On 05/20, Michal Hocko wrote: > > > > > > > > So I assume the leader simply waits for its threads to finish and it > > > > stays in the sibling list. __unhash_process seems like it does the final > > > > cleanup and unlinks the leader from the lists. Which means that > > > > mm_update_next_owner never sees !group_leader. Is that correct Oleg? > > > > > > Yes, yes, the group leader can't go away until the whole thread-group dies. > > > > OK, then we should have a guarantee that mm->owner is always thread > > group leader, right? > > > > > But can't we kill mm->owner somehow? > > > > I would be happy about that. But it is not that simple. > > > > > I mean, turn it into something else, > > > ideally into "struct mem_cgroup *" although I doubt this is possible. > > > > Sounds like a good idea but... it duplicates the cgroup tracking into > > two places and that asks for troubles. On the other hand we are doing > > that already because mm->owner might be in a different cgroup than the > > current. However, this is an inherent problem because CLONE_VM doesn't > > imply CLONE_THREAD. So in the end it doesn't look much worse IMO. > > We will loose the "this task is in charge" aspect and that would > > be a user space visible change but I am not sure how much it is a > > problem. Maybe somebody is (ab)using this to workaround the restriction > > that all threads are in the same cgroup. > > If mm->owner is currently always the threadgroup leader, it should be > fairly straight forward to maintain mm->memcg on all events that move > any threadgroup leader between cgroups, without having mm->owner, no? I have a tentative patch for that. It is fairly straightforward and it even reduces the code size. I plan to post it early next week after it gets some testing. The primary thing I am worried about is the user visible behavior change, though. > It would have a lot of benefits for sure. The code would be simpler, > but it would also reduce some of the cost that Mel is observing inside > __mem_cgroup_count_vm_event(), by reducing one level of indirection. Agreed! -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe cgroups" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html