3.9.2/3.9.3: stack overrun on s390x and ppc64 (WAS Re: 3.9.2: xfstests triggered panic)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Original report:
http://oss.sgi.com/archives/xfs/2013-05/msg00683.html

Also seen on Power7:
http://marc.info/?l=linux-kernel&m=136927904900692&w=2

CAI Qian

----- Original Message -----
> From: "Dave Chinner" <david@xxxxxxxxxxxxx>
> To: "CAI Qian" <caiqian@xxxxxxxxxx>
> Cc: "LKML" <linux-kernel@xxxxxxxxxxxxxxx>, stable@xxxxxxxxxxxxxxx, xfs@xxxxxxxxxxx
> Sent: Thursday, May 23, 2013 11:46:11 AM
> Subject: Re: 3.9.2: xfstests triggered panic
> 
> On Wed, May 22, 2013 at 11:16:56PM -0400, CAI Qian wrote:
> > ----- Original Message -----
> > > From: "Dave Chinner" <david@xxxxxxxxxxxxx>
> > > To: "CAI Qian" <caiqian@xxxxxxxxxx>
> > > Cc: "LKML" <linux-kernel@xxxxxxxxxxxxxxx>, stable@xxxxxxxxxxxxxxx,
> > > xfs@xxxxxxxxxxx
> > > Sent: Wednesday, May 22, 2013 5:53:00 PM
> > > Subject: Re: 3.9.2: xfstests triggered panic
> > > 
> > > On Wed, May 22, 2013 at 04:39:58AM -0400, CAI Qian wrote:
> > > > Reproduced on almost all s390x guests by running xfstests.
> > > > 
> > > > 14634.396658¨ XFS (dm-1): Mounting Filesystem
> > > > 14634.525522¨ XFS (dm-1): Ending clean mount
> > > > 14640.413007¨  <000000000017c6d4>¨ idle_balance+0x1a0/0x340
> > > > 14640.413010¨  <000000000063303e>¨ __schedule+0xa22/0xaf0
> > > > 14640.428279¨  <0000000000630da6>¨ schedule_timeout+0x186/0x2c0
> > > > 14640.428289¨  <00000000001cf864>¨ rcu_gp_kthread+0x1bc/0x298
> > > > 14640.428300¨  <0000000000158c5a>¨ kthread+0xe6/0xec
> > > > 14640.428304¨  <0000000000634de6>¨ kernel_thread_starter+0x6/0xc
> > > > 14640.428308¨  <0000000000634de0>¨ kernel_thread_starter+0x0/0xc
> > > > 14640.428311¨ Last Breaking-Event-Address:
> > > > 14640.428314¨  <000000000016bd76>¨ walk_tg_tree_from+0x3a/0xf4
> > > > 14640.428319¨  list_add corruption. next->prev should be prev
> > > > (0000000000000918
> > > > ), but was           (null). (next=          (null)).
> > > 
> > > Where's XFS in this? walk_tg_tree_from() is part of the scheduler
> > > code. This kind of implies a stack corruption....
> > > 
> > > > Sometimes, this pops up,
> > > > [16907.275002] WARNING: at kernel/rcutree.c:1960
> > > > 
> > > > or this,
> > > > 15316.154171¨ XFS (dm-1): Mounting Filesystem
> > > > 15316.255796¨ XFS (dm-1): Ending clean mount
> > > > 15320.364246¨            00000000006367a2: e310b0080004        lg
> > > > %r1,8(%r
> > > > 11)
> > > > 15320.364249¨            00000000006367a8: 41101010            la
> > > > %r1,16(%
> > > > r1)
> > > > 15320.364251¨            00000000006367ac: e33010000004        lg
> > > > %r3,0(%r
> > > > 1)
> > > > 15320.364252¨ Call Trace:
> > > > 15320.364252¨ Last Breaking-Event-Address:
> > > > 15320.364253¨  � <0000000000000000>¨ Kernel stack overflow.
> > > > 15320.364308¨ CPU: 0 Tainted: GF       W    3.9.2 #1
> > > > 15320.364309¨ Process rhts-test-runne (pid: 625, task:
> > > > 000000003dccc890,
> > > > ksp: 0
> > > 
> > > .... and there you go - a stack overflow. Your kernel stack size is
> > > too small.
> > > 
> > > I'd suggest that you need 16k stacks on s390 - IIRC every function
> > > call has 128 byte stack frame, and there are call chains 70-80
> > > functions deep in the storage stack...
> > Hmm, I am unsure how to set to 16k stack there
> 
> Are you build a 64 bit s390 kernel or a 32 bit kernel? 32 bit
> kernels only have an 8k stack size, 64 bit kernels are 16k (see
> arch/s390/Makefile).
> 
> $ git grep STACK_SIZE arch/s390 |head -2
> arch/s390/Makefile:STACK_SIZE   := 8192
> arch/s390/Makefile:STACK_SIZE   := 16384
> 
> As it is, the stack frame usage is worse than I thought:
> 
> $ git grep STACK_FRAME_OVERHEAD arch/s390 |head -2
> arch/s390/include/uapi/asm/ptrace.h:#define STACK_FRAME_OVERHEAD 96      /*
> size of minimum stack frame */
> arch/s390/include/uapi/asm/ptrace.h:#define STACK_FRAME_OVERHEAD 160      /*
> size of minimum stack frame */
> 
> Overhead is 96 bytes for 32 bit and 160 bytes for 64 bit. So 16k
> stack size is going to have big troubles with a 70-80 function deep
> call chain.
> 
> As for powerpc:
> 
> arch/powerpc/include/asm/ppc_asm.h:#define STACKFRAMESIZE 256
> 
> Yeah, same issue.
> 
> But, seriously, these stack traces are meaningless to anyone not
> familiar with s390 or power7 - they indicate a problem detected
> in the idle loop, not where ever the stack overran.
> 
> Can you please work with the s390/power7 people to obtain whatever
> stack it was that overflowed, and we can go from there.
> 
> Cheers,
> 
> Dave.
> --
> Dave Chinner
> david@xxxxxxxxxxxxx
> 

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs





[Index of Archives]     [Linux XFS Devel]     [Linux Filesystem Development]     [Filesystem Testing]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux