> Since be44562613851 ("cgroup: remove synchronize_rcu() from > cgroup_diput()"), cgroup destruction path makes use of workqueue. css > freeing is performed from a work item from that point on and a later > commit, ea15f8ccdb430 ("cgroup: split cgroup destruction into two > steps"), moves css offlining to workqueue too. > > As cgroup destruction isn't depended upon for memory reclaim, the > destruction work items were put on the system_wq; unfortunately, some > controller may block in the destruction path for considerable duration > while holding cgroup_mutex. As large part of destruction path is > synchronized through cgroup_mutex, when combined with high rate of > cgroup removals, this has potential to fill up system_wq's max_active > of 256. > > Also, it turns out that memcg's css destruction path ends up queueing > and waiting for work items on system_wq through work_on_cpu(). If > such operation happens while system_wq is fully occupied by cgroup > destruction work items, work_on_cpu() can't make forward progress > because system_wq is full and other destruction work items on > system_wq can't make forward progress because the work item waiting > for work_on_cpu() is holding cgroup_mutex, leading to deadlock. > > This can be fixed by queueing destruction work items on a separate > workqueue. This patch creates a dedicated workqueue - > cgroup_destroy_wq - for this purpose. As these work items shouldn't > have inter-dependencies and mostly serialized by cgroup_mutex anyway, > giving high concurrency level doesn't buy anything and the workqueue's > @max_active is set to 1 so that destruction work items are executed > one by one on each CPU. > > Hugh Dickins: Because cgroup_init() is run before init_workqueues(), > cgroup_destroy_wq can't be allocated from cgroup_init(). Do it from a > separate core_initcall(). In the future, we probably want to reorder > so that workqueue init happens before cgroup_init(). > > Signed-off-by: Tejun Heo <tj@xxxxxxxxxx> > Reported-by: Hugh Dickins <hughd@xxxxxxxxxx> > Reported-by: Shawn Bohrer <shawn.bohrer@xxxxxxxxx> > Link: http://lkml.kernel.org/r/20131111220626.GA7509@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx > Link: http://lkml.kernel.org/g/alpine.LNX.2.00.1310301606080.2333@eggly.anvils > Cc: stable@xxxxxxxxxxxxxxx # v3.9+ Acked-by: Li Zefan <lizefan@xxxxxxxxxx> -- To unsubscribe from this list: send the line "unsubscribe cgroups" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html