Re: Possible regression with cgroups in 3.11

Markus Blank-Burian <burian@xxxxxxxxxxx> · Fri, 18 Oct 2013 11:34:15 +0200



I guess I found out, where it is hanging: While waiting for the
test-runs to trigger the bug, I tried "echo w > /proc/sysrq-trigger"
to show the stacks of all blocked tasks, and one of them was always
this one:

[586147.824671] kworker/3:5     D ffff8800df81e208     0 10909      2 0x00000000
[586147.824671] Workqueue: events cgroup_offline_fn
[586147.824671]  ffff8800fba7bbd0 0000000000000002 ffff88007afc2ee0
ffff8800fba7bfd8
[586147.824671]  ffff8800fba7bfd8 0000000000011c40 ffff8800df81ddc0
7fffffffffffffff
[586147.824671]  ffff8800fba7bcf8 ffff8800df81ddc0 0000000000000002
ffff8800fba7bcf0
[586147.824671] Call Trace:
[586147.824671]  [<ffffffff813c57e4>] schedule+0x60/0x62
[586147.824671]  [<ffffffff813c374c>] schedule_timeout+0x34/0x11c
[586147.824671]  [<ffffffff81053305>] ? __wake_up_common+0x51/0x7e
[586147.824671]  [<ffffffff813c6a73>] ? _raw_spin_unlock_irqrestore+0x29/0x34
[586147.824671]  [<ffffffff813c5097>] __wait_for_common+0x9c/0x119
[586147.824671]  [<ffffffff813c3718>] ? svcauth_gss_legacy_init+0x176/0x176
[586147.824671]  [<ffffffff8105790d>] ? wake_up_state+0xd/0xd
[586147.824671]  [<ffffffff8109c237>] ? call_rcu_bh+0x18/0x18
[586147.824671]  [<ffffffff813c5133>] wait_for_completion+0x1f/0x21
[586147.824671]  [<ffffffff8104a8ee>] wait_rcu_gp+0x46/0x4c
[586147.824671]  [<ffffffff8104a899>] ? __rcu_read_unlock+0x4c/0x4c
[586147.824671]  [<ffffffff8109ad6b>] synchronize_rcu+0x29/0x2b
[586147.824671]  [<ffffffff810ec34e>] mem_cgroup_reparent_charges+0x63/0x2fb
[586147.824671]  [<ffffffff810ec75a>] mem_cgroup_css_offline+0xa5/0x14a
[586147.824671]  [<ffffffff8108329e>] offline_css.part.15+0x1b/0x2e
[586147.824671]  [<ffffffff81084f8b>] cgroup_offline_fn+0x72/0x137
[586147.824671]  [<ffffffff81047cb7>] process_one_work+0x15f/0x21e
[586147.824671]  [<ffffffff81048159>] worker_thread+0x144/0x1f0
[586147.824671]  [<ffffffff81048015>] ? rescuer_thread+0x275/0x275
[586147.824671]  [<ffffffff8104cbec>] kthread+0x88/0x90
[586147.824671]  [<ffffffff8104cb64>] ? __kthread_parkme+0x60/0x60
[586147.824671]  [<ffffffff813c756c>] ret_from_fork+0x7c/0xb0
[586147.824671]  [<ffffffff8104cb64>] ? __kthread_parkme+0x60/0x60


On Tue, Oct 15, 2013 at 5:15 AM, Li Zefan <lizefan@xxxxxxxxxx> wrote:
> On 2013/10/14 16:06, Markus Blank-Burian wrote:
>> The crash utility indicated, that the lock was held by a kworker
>> thread, which was idle at the moment. So there might be a case, where
>> no unlock is done. I am trying to reproduce the problem at the moment
>> with CONFIG_PROVE_LOCKING, but without luck so far. It seems, that my
>> test-job is quite bad at reproducing the bug. I'll let you know, if I
>> can find out more.
>>
>
> Thanks. I'll review the code to see if I can find some suspect.
>
> PS: I'll be travelling from 10/16 ~ 10/28, so I may not be able
> to spend much time on this.
>
> --
> To unsubscribe from this list: send the line "unsubscribe cgroups" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe cgroups" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html