Kernel Bug on Linux 4.1.31, possibly nilfs, not sure

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi, I've started getting "kernel bug" messages on a few systems. At first I wasn't sure if it was related to faulty hardware or not. The kernel stack traces do not always mention nilfs, but they roughly relate to the times when we make nilfs snapshots of our system (rsync from ext4 into nilfs).

We use the same kernel image on several dozen systems, and aside from one server, there have only been two crashes like this in the past two months. (that "one server" though has crashed about four or five times, which is why I suspected hardware at first)

However I just had one happen on a Linode, which is pretty reliable hardware, so I figured I'd post and see what people think. The end of the kernel log is attached.

I'm worried that something is corrupting kernel memory, and then causing crashes in un-related parts of the kernel. I'm really not sure how to narrow it down other than turning off the nilfs snapshots and see if it continues to happen, though then i have to come up with another backup solution in the mean time.

It's worth noting that the server crashing frequently also has the largest tree of files. Also, the nilfs filesystems on these systems date back to various versions of nilfs from the 3.* kernel line. It's possible that an old bug is lurking on the filesystem structure, but I don't believe nilfs has a check tool yet, correct?

Thanks,
Michael Conrad

[1785966.799608] ------------[ cut here ]------------
[1785966.799625] Kernel BUG at ffffffff802d1398 [verbose debug info unavailable]
[1785966.799631] invalid opcode: 0000 [#1] SMP
[1785966.799636] Modules linked in: asix r8169 8139too natsemi tg3 bnx2 3c59x tulip e100 e1000 e1000e vmxnet3
[1785966.799651] CPU: 4 PID: 875 Comm: kswapd0 Not tainted 4.1.31 #14
[1785966.799656] task: ffff8801f4f59630 ti: ffff8801f1f98000 task.ti: ffff8801f1f98000
[1785966.799660] RIP: e030:[<ffffffff802d1398>]  [<ffffffff802d1398>] shadow_lru_isolate+0x5a/0x146
[1785966.799674] RSP: e02b:ffff8801f1f9bc38  EFLAGS: 00010002
[1785966.799677] RAX: 0000000000000000 RBX: ffff8801aef61b60 RCX: 000000000000b8b6
[1785966.799681] RDX: 0000000000000001 RSI: ffff8801aef61b60 RDI: ffff8800ff46a040
[1785966.799684] RBP: ffff88009ba06bb8 R08: 0000000000000000 R09: ffff8801f1f9bd00
[1785966.799690] R10: fffffffffffffff2 R11: 0000000000018f68 R12: ffff8800ff46a040
[1785966.799697] R13: ffff88009ba06bd0 R14: ffff8800ff46a048 R15: ffff88018f70fda0
[1785966.799706] FS:  0000000000000000(0000) GS:ffff8801f5d00000(0000) knlGS:00000000f746eb40
[1785966.799713] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
[1785966.799718] CR2: 00000000f77bd234 CR3: 000000016a1e1000 CR4: 0000000000042660
[1785966.799725] Stack:
[1785966.799729]  ffff8800ff46a040 ffff8801aef61b60 ffff8800ff46a048 000000000000004e
[1785966.799738]  ffffffff802d133e ffffffff802d0dcb 0000000000000000 ffff8801f1f9bd00
[1785966.799746]  00000000ffffffff 000000000000008e 0000000000000000 0000000000004f53
[1786028.064675]  ffff88010d2fba80 0000000000b7c044 ffff88009ba06bb8 ffff8801334                                        
[1786028.064675]  ffff88010d2fba80 0000000000b7c044 ffff88009ba06bb8 ffff8801334551c8
[1786028.064683] Call Trace:
[1786028.064694]  [<ffffffff80459681>] ? nilfs_grab_buffer+0x47/0xd9
[1786028.064701]  [<ffffffff8045ad19>] ? nilfs_btnode_submit_block+0x3a/0x188
[1786028.064708]  [<ffffffff80d704d2>] ? _raw_spin_lock_irqsave+0x11/0x6d
[1786028.064715]  [<ffffffff8045cd70>] ? __nilfs_btree_get_block+0x3a/0x12e
[1786028.064721]  [<ffffffff8045ba2c>] ? nilfs_btree_node_get_key+0x9/0xf
[1786028.064727]  [<ffffffff8045bae2>] ? nilfs_btree_node_lookup+0x35/0x8d
[1786028.064733]  [<ffffffff8045cf77>] ? nilfs_btree_do_lookup+0x113/0x1ed
[1786028.064739]  [<ffffffff8045d6ac>] ? nilfs_btree_lookup_contig+0x54/0x274
[1786028.064745]  [<ffffffff8045b398>] ? nilfs_bmap_lookup_contig+0x3d/0x63
[1786028.064750]  [<ffffffff804559ae>] ? nilfs_get_block+0x61/0x19a
[1786028.064758]  [<ffffffff8031a2ee>] ? __block_write_begin+0x159/0x320
[1786028.064763]  [<ffffffff8045594d>] ? __nilfs_mark_inode_dirty+0x8d/0x8d
[1786028.064768]  [<ffffffff8031a4fb>] ? block_write_begin+0x46/0x6f
[1786028.064773]  [<ffffffff8045594d>] ? __nilfs_mark_inode_dirty+0x8d/0x8d
[1786028.064779]  [<ffffffff80455d9e>] ? nilfs_write_begin+0x53/0x87
[1786028.064787]  [<ffffffff802ba29b>] ? generic_perform_write+0xd0/0x18b
[1786028.064795]  [<ffffffff8030926d>] ? file_update_time+0xa9/0xbd
[1786028.064801]  [<ffffffff802bb0d9>] ? __generic_file_write_iter+0x97/0x136
[1786028.064807]  [<ffffffff802bb287>] ? generic_file_write_iter+0x10f/0x179
[1786028.064952]  [<ffffffff802f47d8>] ? __vfs_write+0x93/0xbb
[1786028.064965]  [<ffffffff802f4dcf>] ? vfs_write+0xaf/0x152
[1786028.064975]  [<ffffffff802f5778>] ? SyS_write+0x48/0x82
[1786028.064987]  [<ffffffff80d72ce8>] ? ia32_do_call+0x13/0x13
[1786208.081234] INFO: rcu_sched detected stalls on CPUs/tasks:
[1786208.081248]        3: (1 GPs behind) idle=015/140000000000000/0 softirq=77304788/77304790 fqs=13495
[1786208.081255]        (detected by 2, t=72007 jiffies, g=20472588, c=20472587, q=1831)
[1786208.081263] Task dump for CPU 3:
[1786208.081268] rsync           R  running task        0  8287   8286 0x20020008
[1786208.081276]  0000000000b7c044 0000000000b7c044 0000000000b7c044 0000000000000000
[1786208.081284]  0000000000040000 000000000000000c ffffffff80459681 ffff88010d2fb9e0
[1786208.081292]  ffff88010d2fba80 0000000000b7c044 ffff88009ba06bb8 ffff8801334551c8
[1786208.081299] Call Trace:
[1786208.081310]  [<ffffffff80459681>] ? nilfs_grab_buffer+0x47/0xd9
[1786208.081315]  [<ffffffff8045ad19>] ? nilfs_btnode_submit_block+0x3a/0x188
[1786208.081322]  [<ffffffff80d704d2>] ? _raw_spin_lock_irqsave+0x11/0x6d
[1786208.081328]  [<ffffffff8045cd70>] ? __nilfs_btree_get_block+0x3a/0x12e
[1786208.081333]  [<ffffffff8045ba2c>] ? nilfs_btree_node_get_key+0x9/0xf
[1786208.081339]  [<ffffffff8045bae2>] ? nilfs_btree_node_lookup+0x35/0x8d
[1786208.081344]  [<ffffffff8045cf77>] ? nilfs_btree_do_lookup+0x113/0x1ed
[1786208.081349]  [<ffffffff8045d6ac>] ? nilfs_btree_lookup_contig+0x54/0x274
[1786208.081355]  [<ffffffff8045b398>] ? nilfs_bmap_lookup_contig+0x3d/0x63
[1786208.081361]  [<ffffffff804559ae>] ? nilfs_get_block+0x61/0x19a
[1786208.081367]  [<ffffffff8031a2ee>] ? __block_write_begin+0x159/0x320
[1786208.081372]  [<ffffffff8045594d>] ? __nilfs_mark_inode_dirty+0x8d/0x8d
[1786208.081379]  [<ffffffff8031a4fb>] ? block_write_begin+0x46/0x6f
[1786208.081386]  [<ffffffff8045594d>] ? __nilfs_mark_inode_dirty+0x8d/0x8d
[1786208.081393]  [<ffffffff80455d9e>] ? nilfs_write_begin+0x53/0x87
[1786208.081403]  [<ffffffff802ba29b>] ? generic_perform_write+0xd0/0x18b
[1786208.081413]  [<ffffffff8030926d>] ? file_update_time+0xa9/0xbd
[1786208.081419]  [<ffffffff802bb0d9>] ? __generic_file_write_iter+0x97/0x136
[1786208.081425]  [<ffffffff802bb287>] ? generic_file_write_iter+0x10f/0x179
[1786208.081807]  [<ffffffff802f47d8>] ? __vfs_write+0x93/0xbb
[1786208.081813]  [<ffffffff802f4dcf>] ? vfs_write+0xaf/0x152
[1786208.081820]  [<ffffffff802f5778>] ? SyS_write+0x48/0x82
[1786208.081825]  [<ffffffff80d72ce8>] ? ia32_do_call+0x13/0x13
[1786208.081830] rcu_sched kthread starved for 44124 jiffies!

[Index of Archives]     [Linux Filesystem Development]     [Linux BTRFS]     [Linux CIFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux SCSI]

  Powered by Linux