On Thu, 5 Feb 2015 20:25:21 +0100 Sedat Dilek <sedat.dilek@xxxxxxxxx> wrote: > On Thu, Feb 5, 2015 at 7:45 PM, Paul E. McKenney > <paulmck@xxxxxxxxxxxxxxxxxx> wrote: > > On Thu, Feb 05, 2015 at 10:35:33AM -0800, Dave Hansen wrote: > >> On 02/05/2015 10:34 AM, Paul E. McKenney wrote: > >> >> > Did I actually need to be > >> >> > onlining/offlining CPUs to hit the splat that Sedat was reporting? > >> > Yep, you do need to offline at least one CPU to hit that splat. > >> > >> Heh, do we need a debugging mode that will randomly offline/online CPUs? :) > > > > For that, kernel/rcu/rcutorture.c and kernel/locking/locktorture.c > > are your friends. ;-) > > > > The problem is that I only run RCU-relevant combinations of Kconfigs, > > which means that I missed the ones that Sedat used to find this problem. > > So I guess it is a good thing that others run -next testing. > > > > [ Revived by a voltaren resinat pill... ] > > I reverted "x86/mm: Omit switch_mm() tracing for offline CPUs" > ...and... > applied "tlb: Don't do trace_tlb_flush() on offline CPUs" > ...in my build-dir. Is this Paul's version of the patch or mine? If it is just mine, do you know if Paul's version triggers this too? > ( I did not build from scratch but re-invoking make "updated" the > files touched by Steven's patch, see attached build-log. ) > > Unfortunately, the call-trace remains when doing an offlining of cpu1. > ( It's good to see it's reproducible. ) Was the tracepoint enabled? Or was there some other rcu call that triggered this. Or would cpu_online(smp_processor_id()) return true at this point? -- Steve > > root# echo 0 > /sys/devices/system/cpu/cpu1/online > > [ 121.652796] intel_pstate CPU 1 exiting > [ 121.666272] > [ 121.666274] =============================== > [ 121.666274] [ INFO: suspicious RCU usage. ] > [ 121.666277] 3.19.0-rc7-next-20150204.7-iniza-small #4 Not tainted > [ 121.666278] ------------------------------- > [ 121.666280] include/trace/events/tlb.h:37 suspicious > rcu_dereference_check() usage! > [ 121.666281] > [ 121.666281] other info that might help us debug this: > [ 121.666281] > [ 121.666282] > [ 121.666282] RCU used illegally from offline CPU! > [ 121.666282] rcu_scheduler_active = 1, debug_locks = 0 > [ 121.666283] no locks held by swapper/1/0. > [ 121.666284] > [ 121.666284] stack backtrace: > [ 121.666287] CPU: 1 PID: 0 Comm: swapper/1 Not tainted > 3.19.0-rc7-next-20150204.7-iniza-small #4 > [ 121.666288] Hardware name: SAMSUNG ELECTRONICS CO., LTD. > 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013 > [ 121.666293] 0000000000000001 ffff88011a44fe18 ffffffff817e39cd > 0000000000000011 > [ 121.666296] ffff88011a448290 ffff88011a44fe48 ffffffff810d6af7 > ffff8800d3dfaac0 > [ 121.666299] 0000000000000001 ffffffff81d32ce0 0000000000000005 > ffff88011a44fe78 > [ 121.666300] Call Trace: > [ 121.666308] [<ffffffff817e39cd>] dump_stack+0x4c/0x65 > [ 121.666313] [<ffffffff810d6af7>] lockdep_rcu_suspicious+0xe7/0x120 > [ 121.666318] [<ffffffff810b73f9>] idle_task_exit+0x1c9/0x260 > [ 121.666322] [<ffffffff81054c4e>] play_dead_common+0xe/0x50 > [ 121.666325] [<ffffffff81054ca5>] native_play_dead+0x15/0x140 > [ 121.666330] [<ffffffff8102963f>] arch_cpu_idle_dead+0xf/0x20 > [ 121.666333] [<ffffffff810cdb4e>] cpu_startup_entry+0x37e/0x580 > [ 121.666336] [<ffffffff81053e20>] start_secondary+0x140/0x150 > [ 121.666744] smpboot: CPU 1 is now offline > > >From rcu point this is now safe? > But another area (linux-pm?) is still affected? > I will try to test "vanilla" pm-next if the problem exists with > intel_pstate as suggested by Rafael. > Hmmm, not sure how I can get the pm-next code which went into > next-20150204 as linux-pm.git#linux-next was feeded with new stuff. > > > - Sedat - -- To unsubscribe from this list: send the line "unsubscribe linux-next" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html