On Thu, Feb 5, 2015 at 12:51 AM, Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx> wrote: > On Wed, Feb 04, 2015 at 11:59:31PM +0100, Rafael J. Wysocki wrote: >> On Wednesday, February 04, 2015 01:53:58 PM Paul E. McKenney wrote: >> > On Wed, Feb 04, 2015 at 10:54:07PM +0100, Rafael J. Wysocki wrote: >> > > On Wednesday, February 04, 2015 09:18:03 PM Sedat Dilek wrote: >> > > > On Wed, Feb 4, 2015 at 9:35 AM, Stephen Rothwell <sfr@xxxxxxxxxxxxxxxx> wrote: >> > > > > Hi all, >> > > > > >> > > > > The next release I will be making will be next-20150209 - which will >> > > > > probably be after the v3.19 release. >> > > > > >> > > > > Changes since 20150203: >> > > > > >> > > > > The sound-asoc tree gained a conflict against the sound tree. >> > > > > >> > > > > The scsi tree gained a build failure caused by an interaction with the >> > > > > driver-core tree. I applied a merge fix patch. >> > > > > >> > > > > The akpm-current tree gained a build failure for which I disabled >> > > > > CONFIG_KASAN. >> > > > > >> > > > > Non-merge commits (relative to Linus' tree): 7461 >> > > > > 7314 files changed, 309736 insertions(+), 172363 deletions(-) >> > > > > >> > > > > ---------------------------------------------------------------------------- >> > > > > >> > > > >> > > > [ CC linux-rcu | linux-pm | intel_pstate maintainers ] >> > > >> > > Dirk is not the maintainer of intel_pstate any more, CC: Kristen. >> > > >> > > > Hi, >> > > > >> > > > after suspend-and-resume I see the following call-trace: >> > > >> > > Do you see that after CPU1 offline too? >> > > >> > > > ... >> > > > [ 1144.482666] Disabling non-boot CPUs ... >> > > > [ 1144.483000] intel_pstate CPU 1 exiting >> > > > [ 1144.486064] >> > > > [ 1144.486065] =============================== >> > > > [ 1144.486067] smpboot: CPU 1 didn't die... >> > > > [ 1144.486067] [ INFO: suspicious RCU usage. ] >> > > > [ 1144.486069] 3.19.0-rc7-next-20150204.1-iniza-small #1 Not tainted >> > > > [ 1144.486070] ------------------------------- >> > > > [ 1144.486072] include/trace/events/tlb.h:35 suspicious >> > > > rcu_dereference_check() usage! >> > > > [ 1144.486073] >> > > > [ 1144.486073] other info that might help us debug this: >> > > > [ 1144.486073] >> > > > [ 1144.486074] >> > > > [ 1144.486074] RCU used illegally from offline CPU! >> > > > [ 1144.486074] rcu_scheduler_active = 1, debug_locks = 0 >> > > > [ 1144.486076] no locks held by swapper/1/0. >> > > > [ 1144.486076] >> > > > [ 1144.486076] stack backtrace: >> > > > [ 1144.486079] CPU: 1 PID: 0 Comm: swapper/1 Not tainted >> > > > 3.19.0-rc7-next-20150204.1-iniza-small #1 >> > > > [ 1144.486080] Hardware name: SAMSUNG ELECTRONICS CO., LTD. >> > > > 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013 >> > > > [ 1144.486085] 0000000000000001 ffff88011a44fe18 ffffffff817e370d >> > > > 0000000000000011 >> > > > [ 1144.486088] ffff88011a448290 ffff88011a44fe48 ffffffff810d6847 >> > > > ffff8800c66b9600 >> > > > [ 1144.486091] 0000000000000001 ffff88011a44c000 ffffffff81cb3900 >> > > > ffff88011a44fe78 >> > > > [ 1144.486092] Call Trace: >> > > > [ 1144.486099] [<ffffffff817e370d>] dump_stack+0x4c/0x65 >> > > > [ 1144.486104] [<ffffffff810d6847>] lockdep_rcu_suspicious+0xe7/0x120 >> > >> > As near as I can tell, idle_task_exit() is running on an offline CPU, >> > then calling switch_mm() which contains trace_tlb_flush(), which uses RCU. >> > And RCU is objecting to being used from a CPU that it is ignoring. >> > >> > One approach would be to push RCU's idea of when the CPU goes offline >> > down into arch code in this case, using some Kconfig symbol and >> > the usual conditional compilation. Another approach would be to >> > invoke the trace calls under cpu_online(), for example, for the >> > first such call in switch_mm(): >> > >> > if (cpu_online(smp_processor_id())) >> > trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL); >> > >> > The compiler would discard this if tracing was disabled. >> >> That looks like less intrusive to me. > > One possible concern is increased context-switch path length, but that > would only be the case where tracing is enabled by default. > Hmmm, which kernel-config "trace" options do you mean in particular? >> > Other thoughts? >> >> Well, the whole issue here seems to be that common code using RCU is also >> useful in places where RCU doesn't want to be used. Arguably, we can deal >> with all of those cases in a whack-a-mole manner, but that doesn't seem to >> scale too well. > > Well, I did put a change into -next that makes these particular moles > stick their heads up farther, so this is not a random event. And in > this particular case, we do have the option of extending RCU's reach to > cover this operation, at the expense of a bit more intrusion by RCU into > arch-specific code. If tracing is enabled by default by major distros, > that might be the right thing to do, unappealing though it might be. > Can you point me to that change in rcu-next? > But yes, it would have been far better for RCU to have been picky to > begin with, so that these issues could have been addressed as the were > added to the kernel. I guess one possible source of comfort is that once > this is in place, future issues will make themselves immediately apparent. > Not sure what I now can do to help to trigger this down. Here is 01:00 a.m. -> bedtime :-). - Sedat - -- To unsubscribe from this list: send the line "unsubscribe linux-next" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html