Re: [BUG] cgroup/workques/fork: deadlock when moving cgroups

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed 13-04-16 21:23:13, Michal Hocko wrote:
> On Wed 13-04-16 14:33:09, Tejun Heo wrote:
> > Hello, Petr.
> > 
> > (cc'ing Johannes)
> > 
> > On Wed, Apr 13, 2016 at 11:42:16AM +0200, Petr Mladek wrote:
> > ...
> > > By other words, "memcg_move_char/2860" flushes a work. But it cannot
> > > get flushed because one worker is blocked and another one could not
> > > get created. All these operations are blocked by the very same
> > > "memcg_move_char/2860".
> > > 
> > > Note that also "systemd/1" is waiting for "cgroup_mutex" in
> > > proc_cgroup_show(). But it seems that it is not in the main
> > > cycle causing the deadlock.
> > > 
> > > I am able to reproduce this problem quite easily (within few minutes).
> > > There are often even more tasks waiting for the cgroups-related locks
> > > but they are not causing the deadlock.
> > > 
> > > 
> > > The question is how to solve this problem. I see several possibilities:
> > > 
> > >   + avoid using workqueues in lru_add_drain_all()
> > > 
> > >   + make lru_add_drain_all() killable and restartable
> > > 
> > >   + do not block fork() when lru_add_drain_all() is running,
> > >     e.g. using some lazy techniques like RCU, workqueues
> > > 
> > >   + at least do not block fork of workers; AFAIK, they have a limited
> > >      cgroups usage anyway because they are marked with PF_NO_SETAFFINITY
> > > 
> > > 
> > > I am willing to test any potential fix or even work on the fix.
> > > But I do not have that big insight into the problem, so I would
> > > need some pointers.
> > 
> > An easy solution would be to make lru_add_drain_all() use a
> > WQ_MEM_RECLAIM workqueue.
> 
> I think we can live without lru_add_drain_all() in the migration path.
> We are talking about 4 pagevecs so 56 pages. The charge migration is

wanted to say 56 * num_cpus of course.

> racy anyway. What concerns me more is how all this is fragile. It sounds
> just too easy to add a dependency on per-cpu sync work later and
> reintroduce this issue which is quite hard to detect.
> 
> Cannot we come up with something more robust?  Or at least warn when we
> try to use per-cpu workers with problematic locks held?
> 
> Thanks!
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe cgroups" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]     [Monitors]

  Powered by Linux