Hello, Michal. On Sun, Apr 17, 2016 at 08:07:48AM -0400, Michal Hocko wrote: > On Fri 15-04-16 15:17:19, Tejun Heo wrote: > > mem_cgroup_move_charge() invokes lru_add_drain_all() so that the pvec > > pages can be moved too. lru_add_drain_all() schedules and flushes > > work items on system_wq which depends on being able to create new > > kworkers to make forward progress. Since 1ed1328792ff ("sched, > > cgroup: replace signal_struct->group_rwsem with a global > > percpu_rwsem"), a new task can't be created while in the cgroup > > migration path and the described lru_add_drain_all() invocation can > > easily lead to a deadlock. > > > > Charge moving is best-effort and whether the pvec pages are migrated > > or not doesn't really matter. Don't call it during charge moving. > > Eventually, we want to move the actual charge moving outside the > > migration path. > > > > Signed-off-by: Tejun Heo <tj@xxxxxxxxxx> > > Reported-by: Johannes Weiner <hannes@xxxxxxxxxxx> > > I guess > Debugged-by: Petr Mladek <pmladek@xxxxxxxx> > Reported-by: Cyril Hrubis <chrubis@xxxxxxx> Yeah, definitely. Sorry about missing them. > > Suggested-by: Michal Hocko <mhocko@xxxxxxxxxx> > > Fixes: 1ed1328792ff ("sched, cgroup: replace signal_struct->group_rwsem with a global percpu_rwsem") > > Cc: stable@xxxxxxxxxxxxxxx > > --- > > Hello, > > > > So, this deadlock seems pretty easy to trigger. We'll make the charge > > moving asynchronous eventually but let's not hold off fixing an > > immediate problem. > > Although this looks rather straightforward and it fixes the immediate > problem I am little bit nervous about it. As already pointed out in > other email mem_cgroup_move_charge still depends on mmap_sem for > read and we might hit an even more subtle lockup if the current holder > of the mmap_sem for write depends on the task creation (e.g. some of the > direct reclaim path uses WQ which is really hard to rule out and I even > think that some shrinkers do this). > > I liked your proposal when mem_cgroup_move_charge would be called from a > context which doesn't hold the problematic rwsem much more. Would that > be too intrusive for the stable backport? Yeah, I'm working on the fix but let's plug this one first as it seems really easy to trigger. I got a couple off-list reports (in and outside fb) of this triggering. Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe cgroups" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html