Re: ia64 won't boot because of rcu_sched self-detected stall

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Aug 21, 2012 at 05:46:08PM -0700, Paul E. McKenney wrote:
> On Tue, Aug 21, 2012 at 11:53:50PM +0000, Luck, Tony wrote:
> > Thanks for the pointers.
> > 
> > I turned on CONFIG_RCU_CPU_STALL_INFO=y and bumped RCU_STALL_RAT_DELAY
> > from 2 to 20
> > 
> > This is the new console log.  There is a minute of hang before the first
> > pair of stack traces. Then hang for a minute and the second pair show
> > up.
> > 
> > Linux version 3.6.0-rc2-zx1-smp-next-20120821 (aegl@linux-bxb1) (gcc version 4.3.4 [gcc-4_3-branch revision 152973] (SUSE Linux) ) #2 SMP Tue Aug 21 16:44:17 PDT 2012
> > EFI v1.10 by HP: SALsystab=0x3fefa000 ACPI 2.0=0x3fd5e000 SMBIOS=0x3fefc000 HCDP=0x3fd5c000
> > Early serial console at MMIO 0xff5e0000 (options '9600')
> > bootconsole [uart0] enabled
> > PCDP: v0 at 0x3fd5c000
> > Explicit "console="; ignoring PCDP
> > ACPI: RSDP 000000003fd5e000 00028 (v02     HP)
> > ACPI: XSDT 000000003fd5e02c 00094 (v01     HP   rx2620 00000000   HP 00000000)
> > ACPI: FACP 000000003fd67390 000F4 (v03     HP   rx2620 00000000   HP 00000000)
> > ACPI BIOS Bug: Warning: 32/64X length mismatch in FADT/Gpe0Block: 32/16 (20120711/tbfadt-567)
> > ACPI BIOS Bug: Warning: 32/64X length mismatch in FADT/Gpe1Block: 32/16 (20120711/tbfadt-567)
> > ACPI: DSDT 000000003fd5e100 05F3C (v01     HP   rx2620 00000007 INTL 02012044)
> > ACPI: FACS 000000003fd67488 00040
> > ACPI: SPCR 000000003fd674c8 00050 (v01     HP   rx2620 00000000   HP 00000000)
> > ACPI: DBGP 000000003fd67518 00034 (v01     HP   rx2620 00000000   HP 00000000)
> > ACPI: APIC 000000003fd67610 000B0 (v01     HP   rx2620 00000000   HP 00000000)
> > ACPI: SPMI 000000003fd67550 00050 (v04     HP   rx2620 00000000   HP 00000000)
> > ACPI: CPEP 000000003fd675a0 00034 (v01     HP   rx2620 00000000   HP 00000000)
> > ACPI: SSDT 000000003fd64040 001D6 (v01     HP   rx2620 00000006 INTL 02012044)
> > ACPI: SSDT 000000003fd64220 00702 (v01     HP   rx2620 00000006 INTL 02012044)
> > ACPI: SSDT 000000003fd64930 00A16 (v01     HP   rx2620 00000006 INTL 02012044)
> > ACPI: SSDT 000000003fd65350 00A16 (v01     HP   rx2620 00000006 INTL 02012044)
> > ACPI: SSDT 000000003fd65d70 00A16 (v01     HP   rx2620 00000006 INTL 02012044)
> > ACPI: SSDT 000000003fd66790 00A16 (v01     HP   rx2620 00000006 INTL 02012044)
> > ACPI: SSDT 000000003fd671b0 000EB (v01     HP   rx2620 00000006 INTL 02012044)
> > ACPI: SSDT 000000003fd672a0 000EF (v01     HP   rx2620 00000006 INTL 02012044)
> > ACPI: Local APIC address c0000000fee00000
> > 2 CPUs available, 2 CPUs total
> > warning: skipping physical page 0
> > Initial ramdisk at: 0xe00000407e9bb000 (6071698 bytes)
> > SAL 3.1: HP version 3.15
> > SAL Platform features: None
> > SAL: AP wakeup using external interrupt vector 0xff
> > MCA related initialization done
> > warning: skipping physical page 0
> > Zone ranges:
> >   DMA      [mem 0x00004000-0xffffffff]
> >   Normal   [mem 0x100000000-0x407ffc7fff]
> > Movable zone start for each node
> > Early memory node ranges
> >   node   0: [mem 0x00004000-0x3f4ebfff]
> >   node   0: [mem 0x3fc00000-0x3fd5bfff]
> >   node   0: [mem 0x4040000000-0x407fd2bfff]
> >   node   0: [mem 0x407fd98000-0x407fe07fff]
> >   node   0: [mem 0x407fe80000-0x407ffc7fff]
> > Virtual mem_map starts at 0xa0007fffc7900000
> > Built 1 zonelists in Zone order, mobility grouping off.  Total pages: 72586
> > Kernel command line: BOOT_IMAGE=scsi0:\efi\SuSE\l-zx1-smp.gz root=/dev/disk/by-id/scsi-200000e1100a5d5f2-part2  console=uart,mmio,0xff5e0000 
> > PID hash table entries: 4096 (order: 1, 32768 bytes)
> > Dentry cache hash table entries: 262144 (order: 7, 2097152 bytes)
> > Inode-cache hash table entries: 131072 (order: 6, 1048576 bytes)
> > Memory: 2048432k/2086064k available (13698k code, 37632k reserved, 5791k data, 816k init)
> > SLUB: Genslabs=17, HWalign=128, Order=0-3, MinObjects=0, CPUs=2, Nodes=1
> > Hierarchical RCU implementation.
> > 	Additional per-CPU info printed with stalls.
> > 	RCU restricting CPUs from NR_CPUS=16 to nr_cpu_ids=2.
> > NR_IRQS:768
> > ACPI: Local APIC address c0000000fee00000
> > GSI 36 (level, low) -> CPU 0 (0x0000) vector 48
> > Console: colour dummy device 80x25
> > Calibrating delay loop... 1945.60 BogoMIPS (lpj=3891200)
> > pid_max: default: 32768 minimum: 301
> > Mount-cache hash table entries: 1024
> > ACPI: Core revision 20120711
> > Boot processor id 0x0/0x0
> > Fixed BSP b0 value from CPU 1
> > CPU 1: synchronized ITC with CPU 0 (last diff -3 cycles, maxerr 579 cycles)
> > Brought up 2 CPUs
> > Total of 2 processors activated (3891.20 BogoMIPS).
> > SMBIOS 2.3 present.
> > NET: Registered protocol family 16
> > ACPI: bus type pci registered
> > bio: create slab <bio-0> at 0
> > ACPI: Added _OSI(Module Device)
> > ACPI: Added _OSI(Processor Device)
> > ACPI: Added _OSI(3.0 _SCP Extensions)
> > ACPI: Added _OSI(Processor Aggregator Device)
> > INFO: rcu_sched self-detected stall on CPU
> > 	1: (15000 ticks this GP) idle=001/140000000000001/0 
> 
> OK, this is strange.  The stacks below would lead me to believe that
> the CPUs are idle.  But the idle= value above says that RCU believes
> that this CPU was executing in non-idle process context when the
> interrupt occurred.
> 
> OK, time to take a look at the IA64 idle loop.  And I don't see any
> calls to rcu_idle_enter()...  Please see below for my best guess as
> to where to place it and rcu_idle_exit() -- the rule is that there must
> be no use of RCU read-side critical sections between the call to the
> rcu_idle_enter() and the rcu_idle_exit(), so you probably know better
> than I where to put them.
> 
> void __attribute__((noreturn))
> cpu_idle (void)
> {
> 	void (*mark_idle)(int) = ia64_mark_idle;
>   	int cpu = smp_processor_id();
> 
> 	/* endless idle loop with no priority at all */
> 	while (1) {
> 		rcu_idle_enter();  /* HERE */
> 		if (can_do_pal_halt) {
> 			current_thread_info()->status &= ~TS_POLLING;
> 			/*
> 			 * TS_POLLING-cleared state must be visible before we
> 			 * test NEED_RESCHED:
> 			 */
> 			smp_mb();
> 		} else {
> 			current_thread_info()->status |= TS_POLLING;
> 		}
> 
> 		if (!need_resched()) {
> 			void (*idle)(void);
> #ifdef CONFIG_SMP
> 			min_xtp();
> #endif
> 			rmb();
> 			if (mark_idle)
> 				(*mark_idle)(1);
> 
> 			idle = pm_idle;
> 			if (!idle)
> 				idle = default_idle;
> 			(*idle)();
> 			if (mark_idle)
> 				(*mark_idle)(0);
> #ifdef CONFIG_SMP
> 			normal_xtp();
> #endif
> 		}
> 		rcu_idle_exit();  /* AND HERE */
> 		schedule_preempt_disabled();
> 		check_pgt_cache();
> 		if (cpu_is_offline(cpu))
> 			play_dead();
> 	}
> }
> 
> Without the calls to rcu_idle_enter() and rcu_idle_exit(), RCU has no
> way of knowing that the CPU is idle, so waits forever for a context
> switch.
> 
> Ah, I bet I know what happened...  I don't see tick_nohz_idle_enter(),
> so I would guess that there is no dyntick-idle, so the recent changes in
> dyntick-idle didn't cause rcu_idle_enter() to be added.
> 
> I wonder how many other architectures don't do dyntick-idle?
> 
> Looks like about 12 more.  Probably need fixing as well...

Ouch, that's bad. Ok see below for the conversion of other architectures.

While doing this, I realized that most of these archs just use the same
cpu_idle() function, basically:

void cpu_idle(void)
{
	while (1) {
+		rcu_idle_enter();
		while (!need_resched())
			do_arch_thing();
+		rcu_idle_exit();
		schedule_preempt_disabled();
	}
}

So I think it may be worth creating a "simple idle loop" generic function
for those archs that they can call. This way there is less conversion to do.

Now this is all a regression, so IMO we should first fix the things locally and
do that generic idle loop later, since it's rather a feature.

Hmm?

I'm cooking the patches.

diff --git a/arch/alpha/kernel/process.c b/arch/alpha/kernel/process.c
index 153d3fc..2ebf7b5 100644
--- a/arch/alpha/kernel/process.c
+++ b/arch/alpha/kernel/process.c
@@ -28,6 +28,7 @@
 #include <linux/tty.h>
 #include <linux/console.h>
 #include <linux/slab.h>
+#include <linux/rcupdate.h>
 
 #include <asm/reg.h>
 #include <asm/uaccess.h>
@@ -50,13 +51,16 @@ cpu_idle(void)
 {
 	set_thread_flag(TIF_POLLING_NRFLAG);
 
+	preempt_disable();
 	while (1) {
 		/* FIXME -- EV6 and LCA45 know how to power down
 		   the CPU.  */
 
+		rcu_idle_enter();
 		while (!need_resched())
 			cpu_relax();
-		schedule();
+		rcu_idle_exit();
+		schedule_preempt_disabled();
 	}
 }
 
diff --git a/arch/cris/kernel/process.c b/arch/cris/kernel/process.c
index 66fd017..7f65be6 100644
--- a/arch/cris/kernel/process.c
+++ b/arch/cris/kernel/process.c
@@ -25,6 +25,7 @@
 #include <linux/elfcore.h>
 #include <linux/mqueue.h>
 #include <linux/reboot.h>
+#include <linux/rcupdate.h>
 
 //#define DEBUG
 
@@ -74,6 +75,7 @@ void cpu_idle (void)
 {
 	/* endless idle loop with no priority at all */
 	while (1) {
+		rcu_idle_enter();
 		while (!need_resched()) {
 			void (*idle)(void);
 			/*
@@ -86,6 +88,7 @@ void cpu_idle (void)
 				idle = default_idle;
 			idle();
 		}
+		rcu_idle_exit();
 		schedule_preempt_disabled();
 	}
 }
diff --git a/arch/frv/kernel/process.c b/arch/frv/kernel/process.c
index ff95f50..2eb7fa5 100644
--- a/arch/frv/kernel/process.c
+++ b/arch/frv/kernel/process.c
@@ -25,6 +25,7 @@
 #include <linux/reboot.h>
 #include <linux/interrupt.h>
 #include <linux/pagemap.h>
+#include <linux/rcupdate.h>
 
 #include <asm/asm-offsets.h>
 #include <asm/uaccess.h>
@@ -69,12 +70,14 @@ void cpu_idle(void)
 {
 	/* endless idle loop with no priority at all */
 	while (1) {
+		rcu_idle_enter();
 		while (!need_resched()) {
 			check_pgt_cache();
 
 			if (!frv_dma_inprogress && idle)
 				idle();
 		}
+		rcu_idle_exit();
 
 		schedule_preempt_disabled();
 	}
diff --git a/arch/h8300/kernel/process.c b/arch/h8300/kernel/process.c
index 0e9c315..f153ed1 100644
--- a/arch/h8300/kernel/process.c
+++ b/arch/h8300/kernel/process.c
@@ -36,6 +36,7 @@
 #include <linux/reboot.h>
 #include <linux/fs.h>
 #include <linux/slab.h>
+#include <linux/rcupdate.h>
 
 #include <asm/uaccess.h>
 #include <asm/traps.h>
@@ -78,8 +79,10 @@ void (*idle)(void) = default_idle;
 void cpu_idle(void)
 {
 	while (1) {
+		rcu_idle_enter();
 		while (!need_resched())
 			idle();
+		rcu_idle_exit();
 		schedule_preempt_disabled();
 	}
 }
diff --git a/arch/m32r/kernel/process.c b/arch/m32r/kernel/process.c
index 3a4a32b2..384e63f 100644
--- a/arch/m32r/kernel/process.c
+++ b/arch/m32r/kernel/process.c
@@ -26,6 +26,7 @@
 #include <linux/ptrace.h>
 #include <linux/unistd.h>
 #include <linux/hardirq.h>
+#include <linux/rcupdate.h>
 
 #include <asm/io.h>
 #include <asm/uaccess.h>
@@ -82,6 +83,7 @@ void cpu_idle (void)
 {
 	/* endless idle loop with no priority at all */
 	while (1) {
+		rcu_idle_enter();
 		while (!need_resched()) {
 			void (*idle)(void) = pm_idle;
 
@@ -90,6 +92,7 @@ void cpu_idle (void)
 
 			idle();
 		}
+		rcu_idle_exit();
 		schedule_preempt_disabled();
 	}
 }
diff --git a/arch/m68k/kernel/process.c b/arch/m68k/kernel/process.c
index c488e3c..ac2892e 100644
--- a/arch/m68k/kernel/process.c
+++ b/arch/m68k/kernel/process.c
@@ -25,6 +25,7 @@
 #include <linux/reboot.h>
 #include <linux/init_task.h>
 #include <linux/mqueue.h>
+#include <linux/rcupdate.h>
 
 #include <asm/uaccess.h>
 #include <asm/traps.h>
@@ -75,8 +76,10 @@ void cpu_idle(void)
 {
 	/* endless idle loop with no priority at all */
 	while (1) {
+		rcu_idle_enter();
 		while (!need_resched())
 			idle();
+		rcu_idle_exit();
 		schedule_preempt_disabled();
 	}
 }
diff --git a/arch/mn10300/kernel/process.c b/arch/mn10300/kernel/process.c
index 7dab0cd..e9cceba 100644
--- a/arch/mn10300/kernel/process.c
+++ b/arch/mn10300/kernel/process.c
@@ -25,6 +25,7 @@
 #include <linux/err.h>
 #include <linux/fs.h>
 #include <linux/slab.h>
+#include <linux/rcupdate.h>
 #include <asm/uaccess.h>
 #include <asm/pgtable.h>
 #include <asm/io.h>
@@ -107,6 +108,7 @@ void cpu_idle(void)
 {
 	/* endless idle loop with no priority at all */
 	for (;;) {
+		rcu_idle_enter();
 		while (!need_resched()) {
 			void (*idle)(void);
 
@@ -121,6 +123,7 @@ void cpu_idle(void)
 			}
 			idle();
 		}
+		rcu_idle_exit();
 
 		schedule_preempt_disabled();
 	}
diff --git a/arch/parisc/kernel/process.c b/arch/parisc/kernel/process.c
index d4b94b3..c54a4db 100644
--- a/arch/parisc/kernel/process.c
+++ b/arch/parisc/kernel/process.c
@@ -48,6 +48,7 @@
 #include <linux/unistd.h>
 #include <linux/kallsyms.h>
 #include <linux/uaccess.h>
+#include <linux/rcupdate.h>
 
 #include <asm/io.h>
 #include <asm/asm-offsets.h>
@@ -69,8 +70,10 @@ void cpu_idle(void)
 
 	/* endless idle loop with no priority at all */
 	while (1) {
+		rcu_idle_enter();
 		while (!need_resched())
 			barrier();
+		rcu_idle_exit();
 		schedule_preempt_disabled();
 		check_pgt_cache();
 	}
diff --git a/arch/score/kernel/process.c b/arch/score/kernel/process.c
index 2707023..637970c 100644
--- a/arch/score/kernel/process.c
+++ b/arch/score/kernel/process.c
@@ -27,6 +27,7 @@
 #include <linux/reboot.h>
 #include <linux/elfcore.h>
 #include <linux/pm.h>
+#include <linux/rcupdate.h>
 
 void (*pm_power_off)(void);
 EXPORT_SYMBOL(pm_power_off);
@@ -50,9 +51,10 @@ void __noreturn cpu_idle(void)
 {
 	/* endless idle loop with no priority at all */
 	while (1) {
+		rcu_idle_enter();
 		while (!need_resched())
 			barrier();
-
+		rcu_idle_exit();
 		schedule_preempt_disabled();
 	}
 }
diff --git a/arch/xtensa/kernel/process.c b/arch/xtensa/kernel/process.c
index 2c8d6a3..bc44311 100644
--- a/arch/xtensa/kernel/process.c
+++ b/arch/xtensa/kernel/process.c
@@ -31,6 +31,7 @@
 #include <linux/mqueue.h>
 #include <linux/fs.h>
 #include <linux/slab.h>
+#include <linux/rcupdate.h>
 
 #include <asm/pgtable.h>
 #include <asm/uaccess.h>
@@ -110,8 +111,10 @@ void cpu_idle(void)
 
 	/* endless idle loop with no priority at all */
 	while (1) {
+		rcu_idle_enter();
 		while (!need_resched())
 			platform_idle();
+		rcu_idle_exit();
 		schedule_preempt_disabled();
 	}
 }

		
--
To unsubscribe from this list: send the line "unsubscribe linux-next" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel]     [Linux USB Development]     [Yosemite News]     [Linux SCSI]

  Powered by Linux