> After 96d365e0b86e ("cgroup: make css_set_lock a rwsem and rename it > to css_set_rwsem"), css task iterators requires sleepable context as > it may block on css_set_rwsem. I missed that cgroup_freezer was > iterating tasks under IRQ-safe spinlock freezer->lock. This leads to > errors like the following on freezer state reads and transitions. > > BUG: sleeping function called from invalid context at /work > /os/work/kernel/locking/rwsem.c:20 > in_atomic(): 0, irqs_disabled(): 0, pid: 462, name: bash > 5 locks held by bash/462: > #0: (sb_writers#7){.+.+.+}, at: [<ffffffff811f0843>] vfs_write+0x1a3/0x1c0 > #1: (&of->mutex){+.+.+.}, at: [<ffffffff8126d78b>] kernfs_fop_write+0xbb/0x170 > #2: (s_active#70){.+.+.+}, at: [<ffffffff8126d793>] kernfs_fop_write+0xc3/0x170 > #3: (freezer_mutex){+.+...}, at: [<ffffffff81135981>] freezer_write+0x61/0x1e0 > #4: (rcu_read_lock){......}, at: [<ffffffff81135973>] freezer_write+0x53/0x1e0 > Preemption disabled at:[<ffffffff81104404>] console_unlock+0x1e4/0x460 > > CPU: 3 PID: 462 Comm: bash Not tainted 3.15.0-rc1-work+ #10 > Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 > ffff88000916a6d0 ffff88000e0a3da0 ffffffff81cf8c96 0000000000000000 > ffff88000e0a3dc8 ffffffff810cf4f2 ffffffff82388040 ffff880013aaf740 > 0000000000000002 ffff88000e0a3de8 ffffffff81d05974 0000000000000246 > Call Trace: > [<ffffffff81cf8c96>] dump_stack+0x4e/0x7a > [<ffffffff810cf4f2>] __might_sleep+0x162/0x260 > [<ffffffff81d05974>] down_read+0x24/0x60 > [<ffffffff81133e87>] css_task_iter_start+0x27/0x70 > [<ffffffff8113584d>] freezer_apply_state+0x5d/0x130 > [<ffffffff81135a16>] freezer_write+0xf6/0x1e0 > [<ffffffff8112eb88>] cgroup_file_write+0xd8/0x230 > [<ffffffff8126d7b7>] kernfs_fop_write+0xe7/0x170 > [<ffffffff811f0756>] vfs_write+0xb6/0x1c0 > [<ffffffff811f121d>] SyS_write+0x4d/0xc0 > [<ffffffff81d08292>] system_call_fastpath+0x16/0x1b > > freezer->lock used to be used in hot paths but that time is long gone > and there's no reason for the lock to be IRQ-safe spinlock or even > per-cgroup. In fact, given the fact that a cgroup may contain large > number of tasks, it's not a good idea to iterate over them while > holding IRQ-safe spinlock. > > Let's simplify locking by replacing per-cgroup freezer->lock with > global freezer_mutex. This also makes the comments explaining the > intricacies of policy inheritance and the locking around it as the > states are protected by a common mutex. > > The conversion is mostly straight-forward. The followings are worth > mentioning. > > * freezer_css_online() no longer needs double locking. > > * freezer_attach() now performs propagation simply while holding > freezer_mutex. update_if_frozen() race no longer exists and the > comment is removed. > > * freezer_fork() now tests whether the task is in root cgroup using > the new task_css_is_root() without doing rcu_read_lock/unlock(). If > not, it grabs freezer_mutex and performs the operation. > > * freezer_read() and freezer_change_state() grab freezer_mutex across > the whole operation and pin the css while iterating so that each > descendant processing happens in sleepable context. > > Fixes: 96d365e0b86e ("cgroup: make css_set_lock a rwsem and rename it to css_set_rwsem") > Signed-off-by: Tejun Heo <tj@xxxxxxxxxx> Acked-by: Li Zefan <lizefan@xxxxxxxxxx> -- To unsubscribe from this list: send the line "unsubscribe cgroups" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html