Re: Possible regression with cgroups in 3.11

Tejun Heo <tj@xxxxxxxxxx> · Wed, 13 Nov 2013 12:28:04 +0900

Hello,

On Thu, Oct 31, 2013 at 02:46:27PM -0700, Hugh Dickins wrote:
> On Thu, 31 Oct 2013, Steven Rostedt wrote:
> > On Wed, 30 Oct 2013 19:09:19 -0700 (PDT)
> > Hugh Dickins <hughd@xxxxxxxxxx> wrote:
> > 
> > > This is, at least on the face of it, distinct from the workqueue
> > > cgroup hang I was outlining to Tejun and Michal and Steve last week:
> > > that also strikes in mem_cgroup_reparent_charges, but in the
> > > lru_add_drain_all rather than in mem_cgroup_start_move: the
> > > drain of pagevecs on all cpus never completes.
> > > 
> > 
> > Did anyone ever run this code with lockdep enabled? There is lockdep
> > annotation in the workqueue that should catch a lot of this.
> 
> I believe I tried before, but I've just rechecked to be sure:
> lockdep is enabled but silent when we get into that deadlock.

Ooh... I just realized that work_on_cpu() explicitly opts out of flush
lockdep verification by using __flush_work() to allow work_on_cpu()
callback to use work_on_cpu() recursively.  The commit is c2fda509667b
("workqueue: allow work_on_cpu() to be called recursively").  So, if
we have an actual deadlock scenario involving work_on_cpu(), it may
escape lockdep detection.  I'll see if I can update the lockdep
annotation so that it still allows recursive invocation but doesn't
disable lockdep annotation completely.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe cgroups" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html