On Thu, 31 Oct 2013, Steven Rostedt wrote: > On Thu, 31 Oct 2013 14:46:27 -0700 (PDT) > Hugh Dickins <hughd@xxxxxxxxxx> wrote: > > On Thu, 31 Oct 2013, Steven Rostedt wrote: > > > On Wed, 30 Oct 2013 19:09:19 -0700 (PDT) > > > Hugh Dickins <hughd@xxxxxxxxxx> wrote: > > > > > > > This is, at least on the face of it, distinct from the workqueue > > > > cgroup hang I was outlining to Tejun and Michal and Steve last week: > > > > that also strikes in mem_cgroup_reparent_charges, but in the > > > > lru_add_drain_all rather than in mem_cgroup_start_move: the > > > > drain of pagevecs on all cpus never completes. > > > > > > > > > > Did anyone ever run this code with lockdep enabled? There is lockdep > > > annotation in the workqueue that should catch a lot of this. > > > > I believe I tried before, but I've just rechecked to be sure: > > lockdep is enabled but silent when we get into that deadlock. > > Interesting. > > Anyway, have you posted a backtrace of the latest lockups you are > seeing? Or possible crash it and have kdump/kexec save a core? > > I'd like to take a look at this too. The main backtrace looks like this (on a kernel without lockdep): kworker/23:108 D ffff880c7fd72b00 0 25969 2 0x00000000 Workqueue: events cgroup_offline_fn Call Trace: [<ffffffff81002e09>] schedule+0x29/0x70 [<ffffffff8100039c>] schedule_timeout+0x1cc/0x290 [<ffffffff810c5187>] ? wake_up_process+0x27/0x50 [<ffffffff81001e08>] wait_for_completion+0x98/0x100 [<ffffffff810c5120>] ? try_to_wake_up+0x2c0/0x2c0 [<ffffffff810ad2b9>] flush_work+0x29/0x40 [<ffffffff810ab8d0>] ? worker_enter_idle+0x160/0x160 [<ffffffff810af61b>] schedule_on_each_cpu+0xcb/0x110 [<ffffffff81160735>] lru_add_drain_all+0x15/0x20 [<ffffffff811a9339>] mem_cgroup_reparent_charges+0x39/0x280 [<ffffffff811ad23d>] ? hugetlb_cgroup_css_offline+0x9d/0x210 [<ffffffff811a973f>] mem_cgroup_css_offline+0x5f/0x1e0 [<ffffffff810fd348>] cgroup_offline_fn+0x78/0x1a0 [<ffffffff810ae47c>] process_one_work+0x17c/0x410 [<ffffffff810aeb71>] worker_thread+0x121/0x370 [<ffffffff810aea50>] ? rescuer_thread+0x300/0x300 [<ffffffff810b5c60>] kthread+0xc0/0xd0 [<ffffffff810b5ba0>] ? flush_kthread_worker+0x80/0x80 [<ffffffff81584c9c>] ret_from_fork+0x7c/0xb0 [<ffffffff810b5ba0>] ? flush_kthread_worker+0x80/0x80 With lots of kworker/23:Ns looking like this one: kworker/23:2 D ffff880c7fd72b00 0 21511 2 0x00000000 Workqueue: events cgroup_offline_fn Call Trace: [<ffffffff81002e09>] schedule+0x29/0x70 [<ffffffff810030ce>] schedule_preempt_disabled+0xe/0x10 [<ffffffff81001469>] __mutex_lock_slowpath+0x149/0x1d0 [<ffffffff81000822>] mutex_lock+0x22/0x40 [<ffffffff810fd30a>] cgroup_offline_fn+0x3a/0x1a0 [<ffffffff810ae47c>] process_one_work+0x17c/0x410 [<ffffffff810aeb71>] worker_thread+0x121/0x370 [<ffffffff810aea50>] ? rescuer_thread+0x300/0x300 [<ffffffff810b5c60>] kthread+0xc0/0xd0 [<ffffffff810c005e>] ? finish_task_switch+0x4e/0xe0 [<ffffffff810b5ba0>] ? flush_kthread_worker+0x80/0x80 [<ffffffff81584c9c>] ret_from_fork+0x7c/0xb0 [<ffffffff810b5ba0>] ? flush_kthread_worker+0x80/0x80 We do have kdumps of it, but I've not had time to study those - nor shall I be sending them out! Reminder: these hangs are not the same as those Markus is reporting; perhaps they are related, but I've not grasped such a connection, Thanks, Hugh -- To unsubscribe from this list: send the line "unsubscribe cgroups" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html