On Thu, Oct 01, 2015 at 01:53:51PM -0700, Shi, Yang wrote: > On 10/1/2015 10:08 AM, Steven Rostedt wrote: > >On Thu, 1 Oct 2015 09:37:37 -0700 > >Yang Shi <yang.shi@xxxxxxxxxx> wrote: > > > >>BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:917 > >>in_atomic(): 0, irqs_disabled(): 128, pid: 342, name: perf > >>1 lock held by perf/342: > >> #0: (break_hook_lock){+.+...}, at: [<ffffffc0000851ac>] call_break_hook+0x34/0xd0 > >>irq event stamp: 62224 > >>hardirqs last enabled at (62223): [<ffffffc00010b7bc>] __call_rcu.constprop.59+0x104/0x270 > >>hardirqs last disabled at (62224): [<ffffffc0000fbe20>] vprintk_emit+0x68/0x640 > >>softirqs last enabled at (0): [<ffffffc000097928>] copy_process.part.8+0x428/0x17f8 > >>softirqs last disabled at (0): [< (null)>] (null) > >>CPU: 0 PID: 342 Comm: perf Not tainted 4.1.6-rt5 #4 > >>Hardware name: linux,dummy-virt (DT) > >>Call trace: > >>[<ffffffc000089968>] dump_backtrace+0x0/0x128 > >>[<ffffffc000089ab0>] show_stack+0x20/0x30 > >>[<ffffffc0007030d0>] dump_stack+0x7c/0xa0 > >>[<ffffffc0000c878c>] ___might_sleep+0x174/0x260 > >>[<ffffffc000708ac8>] __rt_spin_lock+0x28/0x40 > >>[<ffffffc000708db0>] rt_read_lock+0x60/0x80 > >>[<ffffffc0000851a8>] call_break_hook+0x30/0xd0 > >>[<ffffffc000085a70>] brk_handler+0x30/0x98 > >>[<ffffffc000082248>] do_debug_exception+0x50/0xb8 > >>Exception stack(0xffffffc00514fe30 to 0xffffffc00514ff50) > >>fe20: 00000000 00000000 c1594680 0000007f > >>fe40: ffffffff ffffffff 92063940 0000007f 0550dcd8 ffffffc0 00000000 00000000 > >>fe60: 0514fe70 ffffffc0 000be1f8 ffffffc0 0514feb0 ffffffc0 0008948c ffffffc0 > >>fe80: 00000004 00000000 0514fed0 ffffffc0 ffffffff ffffffff 9282a948 0000007f > >>fea0: 00000000 00000000 9282b708 0000007f c1592820 0000007f 00083914 ffffffc0 > >>fec0: 00000000 00000000 00000010 00000000 00000064 00000000 00000001 00000000 > >>fee0: 005101e0 00000000 c1594680 0000007f c1594740 0000007f ffffffd8 ffffff80 > >>ff00: 00000000 00000000 00000000 00000000 c1594770 0000007f c1594770 0000007f > >>ff20: 00665e10 00000000 7f7f7f7f 7f7f7f7f 01010101 01010101 00000000 00000000 > >>ff40: 928e4cc0 0000007f 91ff11e8 0000007f > >> > >>call_break_hook is called in atomic context (hard irq disabled), so replace > >>the sleepable lock to rcu lock and replace relevant list operations to rcu > >>version. > >> > >>Signed-off-by: Yang Shi <yang.shi@xxxxxxxxxx> > >>--- > >>v1-> v2 > >>Replace list operations to rcu version. > >> > >> arch/arm64/kernel/debug-monitors.c | 10 +++++----- > >> 1 file changed, 5 insertions(+), 5 deletions(-) > >> > >>diff --git a/arch/arm64/kernel/debug-monitors.c b/arch/arm64/kernel/debug-monitors.c > >>index cebf786..cf0e4fc 100644 > >>--- a/arch/arm64/kernel/debug-monitors.c > >>+++ b/arch/arm64/kernel/debug-monitors.c > >>@@ -276,14 +276,14 @@ static DEFINE_RWLOCK(break_hook_lock); > >> void register_break_hook(struct break_hook *hook) > >> { > >> write_lock(&break_hook_lock); > >>- list_add(&hook->node, &break_hook); > >>+ list_add_rcu(&hook->node, &break_hook); > >> write_unlock(&break_hook_lock); > >> } > >> > >> void unregister_break_hook(struct break_hook *hook) > >> { > >> write_lock(&break_hook_lock); > >>- list_del(&hook->node); > >>+ list_del_rcu(&hook->node); > >> write_unlock(&break_hook_lock); > >> } > > > >Shouldn't there be a synchronize_rcu() somewhere? > > So far kgdb is the only user of unregister_break_hook in mainline kernel. > > Just read Documentation/RCU/checklist.txt, it says: > > Note that synchronize_rcu() -only- guarantees to wait until > all currently executing rcu_read_lock()-protected RCU read-side > critical sections complete. > > For kgdb, the unregister is just called in kgdb_arch_exit by > kgdb_unregister_io_module, which is called when rmmod kgdb module. > > The break point handler is done synchronously. So, it sounds should > be not a problem without calling synchronize_rcu(). OK, I will bite... What does "synchronously" mean here? Unless you have somehow guaranteed that all current readers in call_break_hook() are done between the time you call unregister_break_hook() to remove a given break_hook structure and the time you call register_break_hook() to add that same structure back in, you have a problem. What you have now only protects against invoking register_break_hook() on newly allocated and initialized break_hook structure. But the only calls to register_break_hook() that I see in v4.2 use compile-time initialized structures. So the only failure from using non-RCU list primitives would be due to the list_head's ->next pointer initialization. This could momentarily make the list appear to have only the new element, but not the old element. Unless you do a series of register_break_hook() and unregister_break_hook() calls, in which case a previously deleted structure could momentarily appear to already (or still) be in the list. Are those the sorts of failures you are seeing? Thanx, Paul > Yang > > >-- Steve > > > >> > >>@@ -292,11 +292,11 @@ static int call_break_hook(struct pt_regs *regs, unsigned int esr) > >> struct break_hook *hook; > >> int (*fn)(struct pt_regs *regs, unsigned int esr) = NULL; > >> > >>- read_lock(&break_hook_lock); > >>- list_for_each_entry(hook, &break_hook, node) > >>+ rcu_read_lock(); > >>+ list_for_each_entry_rcu(hook, &break_hook, node) > >> if ((esr & hook->esr_mask) == hook->esr_val) > >> fn = hook->fn; > >>- read_unlock(&break_hook_lock); > >>+ rcu_read_unlock(); > >> > >> return fn ? fn(regs, esr) : DBG_HOOK_ERROR; > >> } > > > -- To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html