(Adding Greg to cc to see if he recalls seeing issues with sysfs dentry suffering from recursive locking recently) On Thu, Jan 05, 2012 at 10:35:04AM -0800, Paul E. McKenney wrote: > On Thu, Jan 05, 2012 at 04:35:29PM +0000, Russell King - ARM Linux wrote: > > On Thu, Jan 05, 2012 at 04:17:39PM +0000, Mel Gorman wrote: > > > Link please? > > > > Forwarded, as its still in my mailbox. > > > > > I'm including a patch below under development that is > > > intended to only cope with the page allocator case under heavy memory > > > pressure. Currently it does not pass testing because eventually RCU > > > gets stalled with the following trace > > > > > > [ 1817.176001] [<ffffffff810214d7>] arch_trigger_all_cpu_backtrace+0x87/0xa0 > > > [ 1817.176001] [<ffffffff810c4779>] __rcu_pending+0x149/0x260 > > > [ 1817.176001] [<ffffffff810c48ef>] rcu_check_callbacks+0x5f/0x110 > > > [ 1817.176001] [<ffffffff81068d7f>] update_process_times+0x3f/0x80 > > > [ 1817.176001] [<ffffffff8108c4eb>] tick_sched_timer+0x5b/0xc0 > > > [ 1817.176001] [<ffffffff8107f28e>] __run_hrtimer+0xbe/0x1a0 > > > [ 1817.176001] [<ffffffff8107f581>] hrtimer_interrupt+0xc1/0x1e0 > > > [ 1817.176001] [<ffffffff81020ef3>] smp_apic_timer_interrupt+0x63/0xa0 > > > [ 1817.176001] [<ffffffff81449073>] apic_timer_interrupt+0x13/0x20 > > > [ 1817.176001] [<ffffffff8116c135>] vfsmount_lock_local_lock+0x25/0x30 > > > [ 1817.176001] [<ffffffff8115c855>] path_init+0x2d5/0x370 > > > [ 1817.176001] [<ffffffff8115eecd>] path_lookupat+0x2d/0x620 > > > [ 1817.176001] [<ffffffff8115f4ef>] do_path_lookup+0x2f/0xd0 > > > [ 1817.176001] [<ffffffff811602af>] user_path_at_empty+0x9f/0xd0 > > > [ 1817.176001] [<ffffffff81154e7b>] vfs_fstatat+0x4b/0x90 > > > [ 1817.176001] [<ffffffff81154f4f>] sys_newlstat+0x1f/0x50 > > > [ 1817.176001] [<ffffffff81448692>] system_call_fastpath+0x16/0x1b > > > > > > It might be a separate bug, don't know for sure. > I rebased the patch on top of 3.2 and tested again with a bunch of debugging options set (PROVE_RCU, PROVE_LOCKING etc). Same results. CPU hotplug is a lot more reliable and less likely to hang but eventually gets into trouble. Taking a closer look though, I don't think this is an RCU problem. It's just the messenger. > Do you get multiple RCU CPU stall-warning messages? Yes, one roughly every 50000 jiffies or so (HZ=250). [ 878.315029] INFO: rcu_sched detected stall on CPU 3 (t=16250 jiffies) [ 878.315032] INFO: rcu_sched detected stall on CPU 6 (t=16250 jiffies) [ 1072.878669] INFO: rcu_sched detected stall on CPU 3 (t=65030 jiffies) [ 1072.878672] INFO: rcu_sched detected stall on CPU 6 (t=65030 jiffies) [ 1267.442308] INFO: rcu_sched detected stall on CPU 3 (t=113810 jiffies) [ 1267.442312] INFO: rcu_sched detected stall on CPU 6 (t=113810 jiffies) [ 1462.005948] INFO: rcu_sched detected stall on CPU 3 (t=162590 jiffies) [ 1462.005952] INFO: rcu_sched detected stall on CPU 6 (t=162590 jiffies) [ 1656.569588] INFO: rcu_sched detected stall on CPU 3 (t=211370 jiffies) [ 1656.569592] INFO: rcu_sched detected stall on CPU 6 (t=211370 jiffies) [ 1851.133229] INFO: rcu_sched detected stall on CPU 6 (t=260150 jiffies) [ 1851.133233] INFO: rcu_sched detected stall on CPU 3 (t=260150 jiffies) [ 2045.696868] INFO: rcu_sched detected stall on CPU 3 (t=308930 jiffies) [ 2045.696872] INFO: rcu_sched detected stall on CPU 6 (t=308930 jiffies) [ 2240.260508] INFO: rcu_sched detected stall on CPU 6 (t=357710 jiffies) [ 2240.260511] INFO: rcu_sched detected stall on CPU 3 (t=357710 jiffies) > If so, it can > be helpful to look at how the stack frame changes over time. These > stalls are normally caused by a loop in the kernel with preemption > disabled, though other scenarios can also cause them. > The stacks are not changing much over time and start with this; [ 878.315029] INFO: rcu_sched detected stall on CPU 3 (t=16250 jiffies) [ 878.315032] INFO: rcu_sched detected stall on CPU 6 (t=16250 jiffies) [ 878.315036] Pid: 4422, comm: udevd Not tainted 3.2.0-guardipi-v1r6 #2 [ 878.315037] Call Trace: [ 878.315038] <IRQ> [<ffffffff810a8b20>] __rcu_pending+0x8e/0x36c [ 878.315052] [<ffffffff81071b9a>] ? tick_nohz_handler+0xdc/0xdc [ 878.315054] [<ffffffff810a8f04>] rcu_check_callbacks+0x106/0x172 [ 878.315056] [<ffffffff810528e0>] update_process_times+0x3f/0x76 [ 878.315058] [<ffffffff81071c0a>] tick_sched_timer+0x70/0x9a [ 878.315060] [<ffffffff8106654e>] __run_hrtimer+0xc7/0x157 [ 878.315062] [<ffffffff810667ec>] hrtimer_interrupt+0xba/0x18a [ 878.315065] [<ffffffff8134fbad>] smp_apic_timer_interrupt+0x86/0x99 [ 878.315067] [<ffffffff8134dbf3>] apic_timer_interrupt+0x73/0x80 [ 878.315068] <EOI> [<ffffffff81345f34>] ? retint_restore_args+0x13/0x13 [ 878.315072] [<ffffffff81139591>] ? __shrink_dcache_sb+0x7d/0x19f [ 878.315075] [<ffffffff81008c6e>] ? native_read_tsc+0x1/0x16 [ 878.315077] [<ffffffff811df434>] ? delay_tsc+0x3a/0x82 [ 878.315079] [<ffffffff811df4a1>] __delay+0xf/0x11 [ 878.315081] [<ffffffff811e51e5>] do_raw_spin_lock+0xb5/0xf9 [ 878.315083] [<ffffffff81345561>] _raw_spin_lock+0x39/0x3d [ 878.315085] [<ffffffff8113972a>] ? shrink_dcache_parent+0x77/0x28c [ 878.315087] [<ffffffff8113972a>] shrink_dcache_parent+0x77/0x28c [ 878.315089] [<ffffffff8113741d>] ? have_submounts+0x13e/0x1bd [ 878.315092] [<ffffffff81185970>] sysfs_dentry_revalidate+0xaa/0xbe [ 878.315093] [<ffffffff8112e731>] do_lookup+0x263/0x2fc [ 878.315096] [<ffffffff8119ca13>] ? security_inode_permission+0x1e/0x20 [ 878.315098] [<ffffffff8112f33d>] link_path_walk+0x1e2/0x763 [ 878.315099] [<ffffffff8112fd66>] path_lookupat+0x5c/0x61a [ 878.315102] [<ffffffff810f4810>] ? might_fault+0x89/0x8d [ 878.315104] [<ffffffff810f47c7>] ? might_fault+0x40/0x8d [ 878.315105] [<ffffffff8113034e>] do_path_lookup+0x2a/0xa8 [ 878.315107] [<ffffffff81132a51>] user_path_at_empty+0x5d/0x97 [ 878.315109] [<ffffffff8107447f>] ? trace_hardirqs_off+0xd/0xf [ 878.315111] [<ffffffff81345c4f>] ? _raw_spin_unlock_irqrestore+0x44/0x5a [ 878.315112] [<ffffffff81132a9c>] user_path_at+0x11/0x13 [ 878.315115] [<ffffffff81128b64>] vfs_fstatat+0x44/0x71 [ 878.315117] [<ffffffff81128bef>] vfs_lstat+0x1e/0x20 [ 878.315118] [<ffffffff81128c10>] sys_newlstat+0x1f/0x40 [ 878.315120] [<ffffffff810759a8>] ? trace_hardirqs_on_caller+0x12d/0x164 [ 878.315122] [<ffffffff811e057e>] ? trace_hardirqs_on_thunk+0x3a/0x3f [ 878.315124] [<ffffffff8107447f>] ? trace_hardirqs_off+0xd/0xf [ 878.315126] [<ffffffff8134d082>] system_call_fastpath+0x16/0x1b [ 878.557790] Pid: 5704, comm: udevd Not tainted 3.2.0-guardipi-v1r6 #2 [ 878.564226] Call Trace: [ 878.566677] <IRQ> [<ffffffff810a8b20>] __rcu_pending+0x8e/0x36c [ 878.572783] [<ffffffff81071b9a>] ? tick_nohz_handler+0xdc/0xdc [ 878.578702] [<ffffffff810a8f04>] rcu_check_callbacks+0x106/0x172 [ 878.584794] [<ffffffff810528e0>] update_process_times+0x3f/0x76 [ 878.590798] [<ffffffff81071c0a>] tick_sched_timer+0x70/0x9a [ 878.596459] [<ffffffff8106654e>] __run_hrtimer+0xc7/0x157 [ 878.601944] [<ffffffff810667ec>] hrtimer_interrupt+0xba/0x18a [ 878.607778] [<ffffffff8134fbad>] smp_apic_timer_interrupt+0x86/0x99 [ 878.614129] [<ffffffff8134dbf3>] apic_timer_interrupt+0x73/0x80 [ 878.620134] <EOI> [<ffffffff81051e66>] ? run_timer_softirq+0x49/0x32a [ 878.626759] [<ffffffff81139591>] ? __shrink_dcache_sb+0x7d/0x19f [ 878.632851] [<ffffffff811df402>] ? delay_tsc+0x8/0x82 [ 878.637988] [<ffffffff811df4a1>] __delay+0xf/0x11 [ 878.642778] [<ffffffff811e51e5>] do_raw_spin_lock+0xb5/0xf9 [ 878.648437] [<ffffffff81345561>] _raw_spin_lock+0x39/0x3d [ 878.653920] [<ffffffff8113972a>] ? shrink_dcache_parent+0x77/0x28c [ 878.660186] [<ffffffff8113972a>] shrink_dcache_parent+0x77/0x28c [ 878.666277] [<ffffffff8113741d>] ? have_submounts+0x13e/0x1bd [ 878.672107] [<ffffffff81185970>] sysfs_dentry_revalidate+0xaa/0xbe [ 878.678372] [<ffffffff8112e731>] do_lookup+0x263/0x2fc [ 878.683596] [<ffffffff8119ca13>] ? security_inode_permission+0x1e/0x20 [ 878.690207] [<ffffffff8112f33d>] link_path_walk+0x1e2/0x763 [ 878.695866] [<ffffffff8112fd66>] path_lookupat+0x5c/0x61a [ 878.701350] [<ffffffff810f4810>] ? might_fault+0x89/0x8d [ 878.706747] [<ffffffff810f47c7>] ? might_fault+0x40/0x8d [ 878.712145] [<ffffffff8113034e>] do_path_lookup+0x2a/0xa8 [ 878.717630] [<ffffffff81132a51>] user_path_at_empty+0x5d/0x97 [ 878.723463] [<ffffffff8107447f>] ? trace_hardirqs_off+0xd/0xf [ 878.729295] [<ffffffff81345c4f>] ? _raw_spin_unlock_irqrestore+0x44/0x5a [ 878.736080] [<ffffffff81132a9c>] user_path_at+0x11/0x13 [ 878.741391] [<ffffffff81128b64>] vfs_fstatat+0x44/0x71 [ 878.746616] [<ffffffff81128bef>] vfs_lstat+0x1e/0x20 [ 878.751668] [<ffffffff81128c10>] sys_newlstat+0x1f/0x40 [ 878.756981] [<ffffffff810759a8>] ? trace_hardirqs_on_caller+0x12d/0x164 [ 878.763678] [<ffffffff811e057e>] ? trace_hardirqs_on_thunk+0x3a/0x3f [ 878.770116] [<ffffffff8107447f>] ? trace_hardirqs_off+0xd/0xf [ 878.775949] [<ffffffff8134d082>] system_call_fastpath+0x16/0x1b [ 908.769486] BUG: spinlock lockup on CPU#6, udevd/4422 [ 908.774547] lock: ffff8803b4c701c8, .magic: dead4ead, .owner: udevd/5709, .owner_cpu: 4 Seeing that the owner was CPU 4, I found earlier in the log [ 815.244051] BUG: spinlock lockup on CPU#4, udevd/5709 [ 815.249103] lock: ffff8803b4c701c8, .magic: dead4ead, .owner: udevd/5709, .owner_cpu: 4 [ 815.258430] Pid: 5709, comm: udevd Not tainted 3.2.0-guardipi-v1r6 #2 [ 815.264866] Call Trace: [ 815.267329] [<ffffffff811e507d>] spin_dump+0x88/0x8d [ 815.272388] [<ffffffff811e5206>] do_raw_spin_lock+0xd6/0xf9 [ 815.278062] [<ffffffff81345561>] ? _raw_spin_lock+0x39/0x3d [ 815.283720] [<ffffffff8113972a>] ? shrink_dcache_parent+0x77/0x28c [ 815.289986] [<ffffffff8113972a>] ? shrink_dcache_parent+0x77/0x28c [ 815.296249] [<ffffffff8113741d>] ? have_submounts+0x13e/0x1bd [ 815.302080] [<ffffffff81185970>] ? sysfs_dentry_revalidate+0xaa/0xbe [ 815.308515] [<ffffffff8112e731>] ? do_lookup+0x263/0x2fc [ 815.313915] [<ffffffff8119ca13>] ? security_inode_permission+0x1e/0x20 [ 815.320524] [<ffffffff8112f33d>] ? link_path_walk+0x1e2/0x763 [ 815.326357] [<ffffffff8112fd66>] ? path_lookupat+0x5c/0x61a [ 815.332014] [<ffffffff810f4810>] ? might_fault+0x89/0x8d [ 815.337410] [<ffffffff810f47c7>] ? might_fault+0x40/0x8d [ 815.342807] [<ffffffff8113034e>] ? do_path_lookup+0x2a/0xa8 [ 815.348465] [<ffffffff81132a51>] ? user_path_at_empty+0x5d/0x97 [ 815.354474] [<ffffffff8107447f>] ? trace_hardirqs_off+0xd/0xf [ 815.360303] [<ffffffff81345c4f>] ? _raw_spin_unlock_irqrestore+0x44/0x5a [ 815.367085] [<ffffffff81132a9c>] ? user_path_at+0x11/0x13 [ 815.372569] [<ffffffff81128b64>] ? vfs_fstatat+0x44/0x71 [ 815.377965] [<ffffffff81128bef>] ? vfs_lstat+0x1e/0x20 [ 815.383192] [<ffffffff81128c10>] ? sys_newlstat+0x1f/0x40 [ 815.388676] [<ffffffff810759a8>] ? trace_hardirqs_on_caller+0x12d/0x164 [ 815.395373] [<ffffffff811e057e>] ? trace_hardirqs_on_thunk+0x3a/0x3f [ 815.401811] [<ffffffff8107447f>] ? trace_hardirqs_off+0xd/0xf [ 815.407642] [<ffffffff8134d082>] ? system_call_fastpath+0x16/0x1b The trace is not particularly useful but it looks like it recursively locked even though the message doesn't say that. If the shrink_dcache_parent() entry is accurate, that corresponds to this static int select_parent(struct dentry * parent) { struct dentry *this_parent; struct list_head *next; unsigned seq; int found = 0; int locked = 0; seq = read_seqbegin(&rename_lock); again: this_parent = parent; spin_lock(&this_parent->d_lock); <----- HERE I'm not overly clear on how VFS locking is meant to work but it almost looks as if the last reference to an inode is being dropped during a sysfs path lookup. Is that meant to happen? Judging by sysfs_dentry_revalidate() - possibly not. It looks like we must have reached out_bad: and called shrink_dcache_parent() on a dentry that was already locked by the running process. Not sure how this could have happened - Greg, does this look familiar? -- Mel Gorman SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html