Re: 3.10.16 cgroup_mutex deadlock

Michal Hocko <mhocko@xxxxxxx> · Tue, 12 Nov 2013 15:31:47 +0100

On Tue 12-11-13 18:17:20, Li Zefan wrote:
> Cc more people
> 
> On 2013/11/12 6:06, Shawn Bohrer wrote:
> > Hello,
> > 
> > This morning I had a machine running 3.10.16 go unresponsive but
> > before we killed it we were able to get the information below.  I'm
> > not an expert here but it looks like most of the tasks below are
> > blocking waiting on the cgroup_mutex.  You can see that the
> > resource_alloca:16502 task is holding the cgroup_mutex and that task
> > appears to be waiting on a lru_add_drain_all() to complete.

Do you have sysrq+l output as well by any chance? That would tell
us what the current CPUs are doing. Dumping all kworker stacks
might be helpful as well. We know that lru_add_drain_all waits for
schedule_on_each_cpu to return so it is waiting for workers to finish.
I would be really curious why some of lru_add_drain_cpu cannot finish
properly. The only reason would be that some work item(s) do not get CPU
or somebody is holding lru_lock.

-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe cgroups" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html