Gluster 3.3, brick crashed

wittwerch at gmail.com (Christian Wittwer) · Tue, 31 Jul 2012 15:31:13 +0200

Thanks for your input. I checked dmesg, and that doesn't look good I think.

[1424954.519747] Pid: 10648, comm: glusterfsd Tainted: G        W
 3.2.0-26-generic #41-Ubuntu Dell Inc. PowerEdge R710/0MD99X
[1424954.520405] RIP: 0010:[<ffffffff8125ea6a>]  [<ffffffff8125ea6a>]
jbd2_journal_stop+0x29a/0x2a0
[1424954.530985] RSP: 0018:ffff8824043919f8  EFLAGS: 00010282
[1424954.541422] RAX: ffff88240471c4d0 RBX: ffff882402b89d20 RCX:
000000000003ffff
[1424954.562380] RDX: ffff882402b89d08 RSI: 0000000000000ff4 RDI:
ffff882402b89d20
[1424954.583295] RBP: ffff882404391a48 R08: 000000000000000a R09:
0000000000000000
[1424954.604255] R10: 0000000000000000 R11: 0000000000000000 R12:
ffff882402b89cf0
[1424954.625440] R13: 0000000000001000 R14: 0000000000000ff4 R15:
00000000ffffffea
[1424954.625441] FS:  00007f424a6e6700(0000) GS:ffff88247fc00000(0000)
knlGS:0000000000000000
[1424954.625443] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[1424954.625444] CR2: 00002b81244e2000 CR3: 00000012018da000 CR4:
00000000000026e0
[1424954.625445] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[1424954.625447] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400
[1424954.625448] Process glusterfsd (pid: 10648, threadinfo
ffff882404390000, task ffff88240471c4d0)
[1424954.625450] Stack:
[1424954.625451]  ffff882402b89cf0 ffff882402b89d20 ffff882404391a18
ffffffff8106730a
[1424954.625454]  ffff882404391a78 00000000ffffffea ffffffff81826f10
0000000000001000
[1424954.625456]  0000000000000ff4 00000000ffffffea ffff882404391a78
ffffffff81235718
[1424954.625459] Call Trace:
[1424954.625461]  [<ffffffff8106730a>] ? warn_slowpath_null+0x1a/0x20
[1424954.625463]  [<ffffffff81235718>] __ext4_journal_stop+0x78/0xa0
[1424954.625466]  [<ffffffff81241f34>]
__ext4_handle_dirty_metadata+0xa4/0x130
[1424954.625468]  [<ffffffff81251dd3>] ? ext4_xattr_block_set+0xd3/0x670
[1424954.625470]  [<ffffffff81216bb6>] ext4_do_update_inode+0x2c6/0x4c0
[1424954.625472]  [<ffffffff81219251>] ext4_mark_iloc_dirty+0x71/0x90
[1424954.625473]  [<ffffffff812526da>] ext4_xattr_set_handle+0x23a/0x4f0
[1424954.625476]  [<ffffffff81252a22>] ext4_xattr_set+0x92/0x100
[1424954.625477]  [<ffffffff81250cf0>] ? ext4_xattr_find_entry+0x90/0x100
[1424954.625479]  [<ffffffff812534fd>] ext4_xattr_trusted_set+0x2d/0x30
[1424954.625481]  [<ffffffff8119afcb>] generic_setxattr+0x6b/0x90
[1424954.625483]  [<ffffffff8119b82b>] __vfs_setxattr_noperm+0x7b/0x1c0
[1424954.625485]  [<ffffffff812dbc3e>] ? evm_inode_setxattr+0xe/0x10
[1424954.625487]  [<ffffffff8119ba2c>] vfs_setxattr+0xbc/0xc0
[1424954.625489]  [<ffffffff8119baf6>] setxattr+0xc6/0x120
[1424954.625491]  [<ffffffff816599ce>] ? _raw_spin_lock+0xe/0x20
[1424954.625492]  [<ffffffff8109efa3>] ? futex_wake+0x113/0x130
[1424954.625494]  [<ffffffff810a0aa8>] ? do_futex+0xd8/0x1b0
[1424954.625496]  [<ffffffff8119bf0b>] sys_fsetxattr+0xbb/0xe0
[1424954.625498]  [<ffffffff81661fc2>] system_call_fastpath+0x16/0x1b
[1424954.625499] Code: c8 4c 89 7d c0 49 87 06 48 8d 7d c0 31 f6 48 89 45
c8 48 8b 45 c8 e8 d6 a1 3f 00 0f b6 43 14 48 8b 55 b8 83 e0 01 e9 9b fe ff
ff <0f> 0b 0f 0b 66 90 55 48 89 e5 66 66 66 66 90 be 01 00 00 00 e8
[1424954.625512] RIP  [<ffffffff8125ea6a>] jbd2_journal_stop+0x29a/0x2a0
[1424954.625514]  RSP <ffff8824043919f8>
[1424954.649028] ---[ end trace 4901c4efb88aa10c ]---
[1511244.755144] EXT4-fs (sdb1): error count: 2
[1511244.761993] EXT4-fs (sdb1): initial error at 1343085498:
ext4_xattr_release_block:496
[1511244.775557] EXT4-fs (sdb1): last error at 1343085498:
ext4_xattr_release_block:504
[1597554.484787] EXT4-fs (sdb1): error count: 2
[1597554.492094] EXT4-fs (sdb1): initial error at 1343085498:
ext4_xattr_release_block:496
[1597554.507148] EXT4-fs (sdb1): last error at 1343085498:
ext4_xattr_release_block:504
[1683864.286198] EXT4-fs (sdb1): error count: 2
[1683864.294365] EXT4-fs (sdb1): initial error at 1343085498:
ext4_xattr_release_block:496
[1683864.310651] EXT4-fs (sdb1): last error at 1343085498:
ext4_xattr_release_block:504
[1770174.105169] EXT4-fs (sdb1): error count: 2
[1770174.113928] EXT4-fs (sdb1): initial error at 1343085498:
ext4_xattr_release_block:496
[1770174.131699] EXT4-fs (sdb1): last error at 1343085498:
ext4_xattr_release_block:504
[1856483.884918] EXT4-fs (sdb1): error count: 2
[1856483.894541] EXT4-fs (sdb1): initial error at 1343085498:
ext4_xattr_release_block:496
[1856483.914191] EXT4-fs (sdb1): last error at 1343085498:
ext4_xattr_release_block:504
[1942793.613632] EXT4-fs (sdb1): error count: 2
[1942793.623403] EXT4-fs (sdb1): initial error at 1343085498:
ext4_xattr_release_block:496
[1942793.642674] EXT4-fs (sdb1): last error at 1343085498:
ext4_xattr_release_block:504
[2029103.473721] EXT4-fs (sdb1): error count: 2
[2029103.483627] EXT4-fs (sdb1): initial error at 1343085498:
ext4_xattr_release_block:496
[2029103.503299] EXT4-fs (sdb1): last error at 1343085498:
ext4_xattr_release_block:504

I checked the raid (bultin hw controller from Dell), and all the disks are
ok.
Next step would be to do a fsck first I guess. But why can such errors
occure? Any ideas?

Cheers,
Christian

2012/7/31 Brian Candler <B.Candler at pobox.com>

> On Tue, Jul 31, 2012 at 02:04:25PM +0200, Christian Wittwer wrote:
> >    b) Can I just restart glusterd on that node to trigger the self
> >    healing?
>
> I would double-check that the underlying filesystem on
> unic-prd-os-compute4:/data/brick0 is OK first.  Look for errors in dmesg;
> look at your RAID status (e.g.  if it's mdraid then cat /proc/mdstat);
> check RAID logs, SMART logs etc.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gluster.org/pipermail/gluster-users/attachments/20120731/8cf54afd/attachment.htm>