Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

Al Viro <viro@xxxxxxxxxxxxxxxxxx> · Mon, 26 May 2014 16:27:03 +0100

On Mon, May 26, 2014 at 05:29:48PM +0300, Mika Westerberg wrote:

> I attached the dmesg with 'echo t > /proc/sysrq-trigger' included.

> [  133.826957] usb 3-10.3: USB disconnect, device number 7
> [  159.326769] BUG: soft lockup - CPU#6 stuck for 22s! [systemd-udevd:1824]
> [  159.326809] CPU: 6 PID: 1824 Comm: systemd-udevd Tainted: G          I   3.15.0-rc7 #55
> [  159.326810] Hardware name: Gigabyte Technology Co., Ltd. Z87X-UD7 TH/Z87X-UD7 TH-CF, BIOS F4 03/18/2014
> [  159.326812] task: ffff880472854a80 ti: ffff8804747ec000 task.ti: ffff8804747ec000
[snip]
> [  159.326834] Call Trace:
> [  159.326838]  [<ffffffff811e74e6>] dentry_kill+0x36/0x280
> [  159.326840]  [<ffffffff811e793a>] shrink_dentry_list+0x8a/0x110
> [  159.326842]  [<ffffffff811e81c4>] check_submounts_and_drop+0x74/0xa0
> [  159.326845]  [<ffffffff81245c5d>] kernfs_dop_revalidate+0x5d/0xd0
> [  159.326847]  [<ffffffff811dba4d>] lookup_fast+0x26d/0x2c0
> [  159.326849]  [<ffffffff811dd2b5>] path_lookupat+0x155/0x780
> [  159.326851]  [<ffffffff811dc152>] ? final_putname+0x22/0x50
> [  159.326853]  [<ffffffff811e198f>] ? user_path_at_empty+0x5f/0x90
> [  159.326856]  [<ffffffff811b6eb5>] ? kmem_cache_alloc+0x35/0x1f0
> [  159.326858]  [<ffffffff811dc1cf>] ? getname_flags+0x4f/0x1a0
> [  159.326860]  [<ffffffff811dd90b>] filename_lookup+0x2b/0xc0
> [  159.326862]  [<ffffffff811e1984>] user_path_at_empty+0x54/0x90
> [  159.326865]  [<ffffffff811d65ff>] ? SYSC_newstat+0x1f/0x40
> [  159.326867]  [<ffffffff811d69ec>] SyS_readlink+0x4c/0x130
> [  159.326870]  [<ffffffff81113c76>] ? __audit_syscall_exit+0x1f6/0x2a0
> [  159.326872]  [<ffffffff816ade69>] system_call_fastpath+0x16/0x1b

That's the livelock.  OK.  But in the stack traces below
	a) that systemd-udevd instance is happily running in userland
and
	b) the only traces of dentry_kill() in the stack are noise
in stacks of gdbus and dbus-daemon.

*grumble*  I wonder if we should instrument d_shrink_add()/d_lru_shrink_mode()
so that they would tag dentry with pointer to current, allowing us to report
something useful in select_collect()...

What's really strange is that the same livelock seems to repeat at least
once more; dentries involved the first time around should've been dead
and buried by then.  And AFAICS, in the log you've originally posted
exactly that has happened - both times to the same process...

Do these livelocks keep happening indefinitely, once triggered?  IOW,
is that a buggered state of dcache and/or kernfs, or is it a transient pileup
that happens when we invalidate a subtree there?
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html