Re: crashes in 4.6.5

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Aug 24, 2016 at 10:57:26AM -0300, Carlos Carvalho wrote:
> J. Bruce Fields (bfields@xxxxxxxxxxxx) wrote on Wed, Aug 24, 2016 at 10:40:20AM BRT:
> > On Thu, Aug 04, 2016 at 05:36:12PM -0300, Carlos Carvalho wrote:
> > > Our nfs server, running 4.6.5, doesn't last more than 2-3 days. There are
> > > several errors in the log that seem to be nfs4 related.
> > 
> > What's the very first error of any kind?
> 
> I sent all that's in the log; the dumps start with general protection fault,
> and INFO: rcu_sched for the stalls.

Oh, got it, thanks.  I didn't undestand your first message.

> The latest one, 4.7.2 in a nfs3-only
> machine that I reported yesterday, just outputs a few lines with rcu_sched
> stall warnings, only to the console and nothing in logs. I didn't write them
> down this time. The machine still reacted to SysRq commands; I used a forced
> umount and then an immediate reboot. However there was still non-negligible
> filesystem corruption...

What kind of filesystem corruption, and what filesystem is this?  I'm a
little surprised that what looks like a crash in NFSv4 state code should
be corrupting the filesystem.

--b.

> 
> > --b.
> > 
> > > After many of these
> > > errors the machine eventually crashes:
> > > 
> > > Aug  4 07:39:57 urquell kernel: general protection fault: 0000 [#2] SMP 
> > > Aug  4 07:39:57 urquell kernel: Modules linked in:
> > > Aug  4 07:39:57 urquell kernel: CPU: 0 PID: 19621 Comm: nfsd Tainted: G      D   I     4.6.5 #1
> > > Aug  4 07:39:57 urquell kernel: Hardware name: SGI.COM SGI MIS Server/S2600JF, BIOS SE5C600.86B.01.03.0002.062020121504 06/20/2012
> > > Aug  4 07:39:57 urquell kernel: task: ffff881fe24dd940 ti: ffff880e61bd8000 task.ti: ffff880e61bd8000
> > > Aug  4 07:39:57 urquell kernel: RIP: 0010:[<ffffffff81252ef2>]  [<ffffffff81252ef2>] nfsd4_del_conns+0x72/0xc0
> > > Aug  4 07:39:57 urquell kernel: RSP: 0018:ffff880e61bdbce0  EFLAGS: 00010246
> > > Aug  4 07:39:57 urquell kernel: RAX: ffff880b8a34b758 RBX: ffff880b8a34b740 RCX: dead000000000100
> > > Aug  4 07:39:57 urquell kernel: RDX: dead000000000200 RSI: 0000000000000001 RDI: ffff880e22af6078
> > > Aug  4 07:39:57 urquell kernel: RBP: ffff880e22af6000 R08: 0000000000000001 R09: 00004c0ca729bd38
> > > Aug  4 07:39:57 urquell kernel: R10: 0000000000000001 R11: 0000000000000001 R12: ffff881aaa9c4800
> > > Aug  4 07:39:57 urquell kernel: R13: ffff880fe21ab298 R14: ffff881aaa9c4b30 R15: ffff880fe21ab200
> > > Aug  4 07:39:57 urquell kernel: FS:  0000000000000000(0000) GS:ffff880fffc00000(0000) knlGS:0000000000000000
> > > Aug  4 07:39:57 urquell kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > Aug  4 07:39:57 urquell kernel: CR2: 00007f716a49c000 CR3: 0000000001c06000 CR4: 00000000000406f0
> > > Aug  4 07:39:57 urquell kernel: Stack:
> > > Aug  4 07:39:57 urquell kernel: ffff880fe21ab218 ffff881aaa9c4800 ffff880fe21ab200 ffff881aaa9c4b38
> > > Aug  4 07:39:57 urquell kernel: ffff880c9ace8c80 0000000000000000 ffffffff812532a6 ffff881aaa9c4830
> > > Aug  4 07:39:57 urquell kernel: ffff881aaa9c4800 ffff880e61bdbd40 ffff881aaa9c4868 ffffffff81254c7f
> > > Aug  4 07:39:57 urquell kernel: Call Trace:
> > > Aug  4 07:39:57 urquell kernel: [<ffffffff812532a6>] ? free_client+0x26/0x150
> > > Aug  4 07:39:57 urquell kernel: [<ffffffff81254c7f>] ? __destroy_client+0x14f/0x160
> > > Aug  4 07:39:57 urquell kernel: [<ffffffff81255ad9>] ? nfsd4_create_session+0x659/0x850
> > > Aug  4 07:39:57 urquell kernel: [<ffffffff8124b1d2>] ? nfsd4_proc_compound+0x302/0x560
> > > Aug  4 07:39:57 urquell kernel: [<ffffffff8123c7ae>] ? nfsd_dispatch+0x7e/0x160
> > > Aug  4 07:39:57 urquell kernel: [<ffffffff8171f80d>] ? svc_process_common+0x38d/0x510
> > > Aug  4 07:39:57 urquell kernel: [<ffffffff8171fa6d>] ? svc_process+0xdd/0xf0
> > > Aug  4 07:39:57 urquell kernel: [<ffffffff8123c284>] ? nfsd+0xe4/0x150
> > > Aug  4 07:39:57 urquell kernel: [<ffffffff8123c1a0>] ? nfsd_destroy+0x60/0x60
> > > Aug  4 07:39:57 urquell kernel: [<ffffffff810976ca>] ? kthread+0xca/0xe0
> > > Aug  4 07:39:57 urquell kernel: [<ffffffff81252e00>] ? nfsd4_put_drc_mem+0x40/0x40
> > > Aug  4 07:39:57 urquell kernel: [<ffffffff81744192>] ? ret_from_fork+0x22/0x40
> > > Aug  4 07:39:57 urquell kernel: [<ffffffff81097600>] ? kthread_park+0x50/0x50
> > > Aug  4 07:39:57 urquell kernel: Code: 89 10 48 89 1b 48 89 5b 08 41 c6 84 24 30 03 00 00 00 48 8b 6b 10 48 8d 7d 78 e8 1a 0f 4f 00 48 8b 4b 18 48 8d 43 18 48 8b 53 20 <48> 89 51 08 48 89 0a 48 89 43 18 48 89 43 20 c6 45 78 00 48 8b 
> > > Aug  4 07:39:57 urquell kernel: RIP  [<ffffffff81252ef2>] nfsd4_del_conns+0x72/0xc0
> > > Aug  4 07:39:57 urquell kernel: RSP <ffff880e61bdbce0>
> > > Aug  4 07:39:57 urquell kernel: ---[ end trace bbec43f5aa22f6e3 ]---
> > > 
> > > There are also cpu stalls:
> > > 
> > > Aug  4 10:46:14 urquell kernel: INFO: rcu_sched self-detected stall on CPU
> > > Aug  4 10:46:14 urquell kernel: 	4-...: (14997 ticks this GP) idle=76b/140000000000001/0 softirq=30513439/30513439 fqs=13790 
> > > Aug  4 10:46:14 urquell kernel: 	 (t=15000 jiffies g=10123193 c=10123192 q=4324508)
> > > Aug  4 10:46:14 urquell kernel: Task dump for CPU 4:
> > > Aug  4 10:46:14 urquell kernel: nfsd            R  running task        0 19618      2 0x00000008
> > > Aug  4 10:46:14 urquell kernel: ffffffff81c37240 ffffffff810bfc4b ffff880fffc94880 ffffffff81c37240
> > > Aug  4 10:46:14 urquell kernel: 0000000000000000 ffff881fe24dbfc0 ffffffff810c2adc ffff880fffc93bc0
> > > Aug  4 10:46:14 urquell kernel: 0000000000013bc0 0000000000000004 ffff880fffc83e90 ffff880fffc83e90
> > > Aug  4 10:46:14 urquell kernel: Call Trace:
> > > Aug  4 10:46:14 urquell kernel: <IRQ>  [<ffffffff810bfc4b>] ? rcu_dump_cpu_stacks+0x7b/0xb0
> > > Aug  4 10:46:14 urquell kernel: [<ffffffff810c2adc>] ? rcu_check_callbacks+0x3bc/0x680
> > > Aug  4 10:46:14 urquell kernel: [<ffffffff810c467d>] ? update_process_times+0x2d/0x50
> > > Aug  4 10:46:14 urquell kernel: [<ffffffff810d1301>] ? tick_sched_timer+0x41/0x160
> > > Aug  4 10:46:14 urquell kernel: [<ffffffff810c4c13>] ? __hrtimer_run_queues+0xb3/0x150
> > > Aug  4 10:46:14 urquell kernel: [<ffffffff810c5124>] ? hrtimer_interrupt+0x94/0x170
> > > Aug  4 10:46:14 urquell kernel: [<ffffffff8106c4e4>] ? smp_apic_timer_interrupt+0x34/0x50
> > > Aug  4 10:46:14 urquell kernel: [<ffffffff81744b4f>] ? apic_timer_interrupt+0x7f/0x90
> > > Aug  4 10:46:14 urquell kernel: <EOI>  [<ffffffff81091a47>] ? queue_work_on+0x17/0x20
> > > Aug  4 10:46:14 urquell kernel: [<ffffffff81252e66>] ? nfsd4_conn_lost+0x66/0x80
> > > Aug  4 10:46:14 urquell kernel: [<ffffffff8172c9fd>] ? svc_delete_xprt+0xcd/0x130
> > > Aug  4 10:46:14 urquell kernel: [<ffffffff8172d1f9>] ? svc_recv+0x5f9/0x950
> > > Aug  4 10:46:14 urquell kernel: [<ffffffff8172c74f>] ? svc_xprt_release+0x8f/0xf0
> > > Aug  4 10:46:14 urquell kernel: [<ffffffff8123c272>] ? nfsd+0xd2/0x150
> > > Aug  4 10:46:14 urquell kernel: [<ffffffff8123c1a0>] ? nfsd_destroy+0x60/0x60
> > > Aug  4 10:46:14 urquell kernel: [<ffffffff810976ca>] ? kthread+0xca/0xe0
> > > Aug  4 10:46:14 urquell kernel: [<ffffffff81744192>] ? ret_from_fork+0x22/0x40
> > > Aug  4 10:46:14 urquell kernel: [<ffffffff81097600>] ? kthread_park+0x50/0x50
> > > Aug  4 10:46:14 urquell kernel: INFO: rcu_sched detected stalls on CPUs/tasks:
> > > Aug  4 10:46:14 urquell kernel: 	4-...: (14998 ticks this GP) idle=76b/140000000000000/0 softirq=30513439/30513439 fqs=13791 
> > > Aug  4 10:46:14 urquell kernel: 	(detected by 20, t=15004 jiffies, g=10123193, c=10123192, q=4324792)
> > > Aug  4 10:46:14 urquell kernel: Task dump for CPU 4:
> > > Aug  4 10:46:14 urquell kernel: nfsd            R  running task        0 19618      2 0x00000008
> > > Aug  4 10:46:14 urquell kernel: ffff880fff800060 ffff88029099cb00 0001881ff5bc4f00 ffff8805e78f6ac0
> > > Aug  4 10:46:14 urquell kernel: ffff8800bbbe8600 000000000000e3b0 0000000000000040 ffff880fff811800
> > > Aug  4 10:46:14 urquell kernel: ffffffff810917d9 ffffea002ec51bc0 0000000000000206 ffff881a63046800
> > > Aug  4 10:46:14 urquell kernel: Call Trace:
> > > Aug  4 10:46:14 urquell kernel: [<ffffffff810917d9>] ? __queue_work+0x119/0x370
> > > Aug  4 10:46:14 urquell kernel: [<ffffffff81091a40>] ? queue_work_on+0x10/0x20
> > > Aug  4 10:46:14 urquell kernel: [<ffffffff81252e66>] ? nfsd4_conn_lost+0x66/0x80
> > > Aug  4 10:46:14 urquell kernel: [<ffffffff8172c9fd>] ? svc_delete_xprt+0xcd/0x130
> > > Aug  4 10:46:14 urquell kernel: [<ffffffff8172d1f9>] ? svc_recv+0x5f9/0x950
> > > Aug  4 10:46:14 urquell kernel: [<ffffffff8172c74f>] ? svc_xprt_release+0x8f/0xf0
> > > Aug  4 10:46:14 urquell kernel: [<ffffffff8123c272>] ? nfsd+0xd2/0x150
> > > Aug  4 10:46:14 urquell kernel: [<ffffffff8123c1a0>] ? nfsd_destroy+0x60/0x60
> > > Aug  4 10:46:14 urquell kernel: [<ffffffff810976ca>] ? kthread+0xca/0xe0
> > > Aug  4 10:46:14 urquell kernel: [<ffffffff81744192>] ? ret_from_fork+0x22/0x40
> > > Aug  4 10:46:14 urquell kernel: [<ffffffff81097600>] ? kthread_park+0x50/0x50
> > > 
> > > This is the only machine with problems running 4.6.5, and the only one that
> > > uses nfs4.
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> > > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux