On Wed, May 22, 2013 at 11:16:56PM -0400, CAI Qian wrote: > ----- Original Message ----- > > From: "Dave Chinner" <david@xxxxxxxxxxxxx> > > To: "CAI Qian" <caiqian@xxxxxxxxxx> > > Cc: "LKML" <linux-kernel@xxxxxxxxxxxxxxx>, stable@xxxxxxxxxxxxxxx, xfs@xxxxxxxxxxx > > Sent: Wednesday, May 22, 2013 5:53:00 PM > > Subject: Re: 3.9.2: xfstests triggered panic > > > > On Wed, May 22, 2013 at 04:39:58AM -0400, CAI Qian wrote: > > > Reproduced on almost all s390x guests by running xfstests. > > > > > > 14634.396658¨ XFS (dm-1): Mounting Filesystem > > > 14634.525522¨ XFS (dm-1): Ending clean mount > > > 14640.413007¨ <000000000017c6d4>¨ idle_balance+0x1a0/0x340 > > > 14640.413010¨ <000000000063303e>¨ __schedule+0xa22/0xaf0 > > > 14640.428279¨ <0000000000630da6>¨ schedule_timeout+0x186/0x2c0 > > > 14640.428289¨ <00000000001cf864>¨ rcu_gp_kthread+0x1bc/0x298 > > > 14640.428300¨ <0000000000158c5a>¨ kthread+0xe6/0xec > > > 14640.428304¨ <0000000000634de6>¨ kernel_thread_starter+0x6/0xc > > > 14640.428308¨ <0000000000634de0>¨ kernel_thread_starter+0x0/0xc > > > 14640.428311¨ Last Breaking-Event-Address: > > > 14640.428314¨ <000000000016bd76>¨ walk_tg_tree_from+0x3a/0xf4 > > > 14640.428319¨ list_add corruption. next->prev should be prev > > > (0000000000000918 > > > ), but was (null). (next= (null)). > > > > Where's XFS in this? walk_tg_tree_from() is part of the scheduler > > code. This kind of implies a stack corruption.... > > > > > Sometimes, this pops up, > > > [16907.275002] WARNING: at kernel/rcutree.c:1960 > > > > > > or this, > > > 15316.154171¨ XFS (dm-1): Mounting Filesystem > > > 15316.255796¨ XFS (dm-1): Ending clean mount > > > 15320.364246¨ 00000000006367a2: e310b0080004 lg > > > %r1,8(%r > > > 11) > > > 15320.364249¨ 00000000006367a8: 41101010 la > > > %r1,16(% > > > r1) > > > 15320.364251¨ 00000000006367ac: e33010000004 lg > > > %r3,0(%r > > > 1) > > > 15320.364252¨ Call Trace: > > > 15320.364252¨ Last Breaking-Event-Address: > > > 15320.364253¨ � <0000000000000000>¨ Kernel stack overflow. > > > 15320.364308¨ CPU: 0 Tainted: GF W 3.9.2 #1 > > > 15320.364309¨ Process rhts-test-runne (pid: 625, task: 000000003dccc890, > > > ksp: 0 > > > > .... and there you go - a stack overflow. Your kernel stack size is > > too small. > > > > I'd suggest that you need 16k stacks on s390 - IIRC every function > > call has 128 byte stack frame, and there are call chains 70-80 > > functions deep in the storage stack... > Hmm, I am unsure how to set to 16k stack there Are you build a 64 bit s390 kernel or a 32 bit kernel? 32 bit kernels only have an 8k stack size, 64 bit kernels are 16k (see arch/s390/Makefile). $ git grep STACK_SIZE arch/s390 |head -2 arch/s390/Makefile:STACK_SIZE := 8192 arch/s390/Makefile:STACK_SIZE := 16384 As it is, the stack frame usage is worse than I thought: $ git grep STACK_FRAME_OVERHEAD arch/s390 |head -2 arch/s390/include/uapi/asm/ptrace.h:#define STACK_FRAME_OVERHEAD 96 /* size of minimum stack frame */ arch/s390/include/uapi/asm/ptrace.h:#define STACK_FRAME_OVERHEAD 160 /* size of minimum stack frame */ Overhead is 96 bytes for 32 bit and 160 bytes for 64 bit. So 16k stack size is going to have big troubles with a 70-80 function deep call chain. As for powerpc: arch/powerpc/include/asm/ppc_asm.h:#define STACKFRAMESIZE 256 Yeah, same issue. But, seriously, these stack traces are meaningless to anyone not familiar with s390 or power7 - they indicate a problem detected in the idle loop, not where ever the stack overran. Can you please work with the s390/power7 people to obtain whatever stack it was that overflowed, and we can go from there. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe stable" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html