Thanks for your input. I checked dmesg, and that doesn't look good I think. [1424954.519747] Pid: 10648, comm: glusterfsd Tainted: G W 3.2.0-26-generic #41-Ubuntu Dell Inc. PowerEdge R710/0MD99X [1424954.520405] RIP: 0010:[<ffffffff8125ea6a>] [<ffffffff8125ea6a>] jbd2_journal_stop+0x29a/0x2a0 [1424954.530985] RSP: 0018:ffff8824043919f8 EFLAGS: 00010282 [1424954.541422] RAX: ffff88240471c4d0 RBX: ffff882402b89d20 RCX: 000000000003ffff [1424954.562380] RDX: ffff882402b89d08 RSI: 0000000000000ff4 RDI: ffff882402b89d20 [1424954.583295] RBP: ffff882404391a48 R08: 000000000000000a R09: 0000000000000000 [1424954.604255] R10: 0000000000000000 R11: 0000000000000000 R12: ffff882402b89cf0 [1424954.625440] R13: 0000000000001000 R14: 0000000000000ff4 R15: 00000000ffffffea [1424954.625441] FS: 00007f424a6e6700(0000) GS:ffff88247fc00000(0000) knlGS:0000000000000000 [1424954.625443] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [1424954.625444] CR2: 00002b81244e2000 CR3: 00000012018da000 CR4: 00000000000026e0 [1424954.625445] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [1424954.625447] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [1424954.625448] Process glusterfsd (pid: 10648, threadinfo ffff882404390000, task ffff88240471c4d0) [1424954.625450] Stack: [1424954.625451] ffff882402b89cf0 ffff882402b89d20 ffff882404391a18 ffffffff8106730a [1424954.625454] ffff882404391a78 00000000ffffffea ffffffff81826f10 0000000000001000 [1424954.625456] 0000000000000ff4 00000000ffffffea ffff882404391a78 ffffffff81235718 [1424954.625459] Call Trace: [1424954.625461] [<ffffffff8106730a>] ? warn_slowpath_null+0x1a/0x20 [1424954.625463] [<ffffffff81235718>] __ext4_journal_stop+0x78/0xa0 [1424954.625466] [<ffffffff81241f34>] __ext4_handle_dirty_metadata+0xa4/0x130 [1424954.625468] [<ffffffff81251dd3>] ? ext4_xattr_block_set+0xd3/0x670 [1424954.625470] [<ffffffff81216bb6>] ext4_do_update_inode+0x2c6/0x4c0 [1424954.625472] [<ffffffff81219251>] ext4_mark_iloc_dirty+0x71/0x90 [1424954.625473] [<ffffffff812526da>] ext4_xattr_set_handle+0x23a/0x4f0 [1424954.625476] [<ffffffff81252a22>] ext4_xattr_set+0x92/0x100 [1424954.625477] [<ffffffff81250cf0>] ? ext4_xattr_find_entry+0x90/0x100 [1424954.625479] [<ffffffff812534fd>] ext4_xattr_trusted_set+0x2d/0x30 [1424954.625481] [<ffffffff8119afcb>] generic_setxattr+0x6b/0x90 [1424954.625483] [<ffffffff8119b82b>] __vfs_setxattr_noperm+0x7b/0x1c0 [1424954.625485] [<ffffffff812dbc3e>] ? evm_inode_setxattr+0xe/0x10 [1424954.625487] [<ffffffff8119ba2c>] vfs_setxattr+0xbc/0xc0 [1424954.625489] [<ffffffff8119baf6>] setxattr+0xc6/0x120 [1424954.625491] [<ffffffff816599ce>] ? _raw_spin_lock+0xe/0x20 [1424954.625492] [<ffffffff8109efa3>] ? futex_wake+0x113/0x130 [1424954.625494] [<ffffffff810a0aa8>] ? do_futex+0xd8/0x1b0 [1424954.625496] [<ffffffff8119bf0b>] sys_fsetxattr+0xbb/0xe0 [1424954.625498] [<ffffffff81661fc2>] system_call_fastpath+0x16/0x1b [1424954.625499] Code: c8 4c 89 7d c0 49 87 06 48 8d 7d c0 31 f6 48 89 45 c8 48 8b 45 c8 e8 d6 a1 3f 00 0f b6 43 14 48 8b 55 b8 83 e0 01 e9 9b fe ff ff <0f> 0b 0f 0b 66 90 55 48 89 e5 66 66 66 66 90 be 01 00 00 00 e8 [1424954.625512] RIP [<ffffffff8125ea6a>] jbd2_journal_stop+0x29a/0x2a0 [1424954.625514] RSP <ffff8824043919f8> [1424954.649028] ---[ end trace 4901c4efb88aa10c ]--- [1511244.755144] EXT4-fs (sdb1): error count: 2 [1511244.761993] EXT4-fs (sdb1): initial error at 1343085498: ext4_xattr_release_block:496 [1511244.775557] EXT4-fs (sdb1): last error at 1343085498: ext4_xattr_release_block:504 [1597554.484787] EXT4-fs (sdb1): error count: 2 [1597554.492094] EXT4-fs (sdb1): initial error at 1343085498: ext4_xattr_release_block:496 [1597554.507148] EXT4-fs (sdb1): last error at 1343085498: ext4_xattr_release_block:504 [1683864.286198] EXT4-fs (sdb1): error count: 2 [1683864.294365] EXT4-fs (sdb1): initial error at 1343085498: ext4_xattr_release_block:496 [1683864.310651] EXT4-fs (sdb1): last error at 1343085498: ext4_xattr_release_block:504 [1770174.105169] EXT4-fs (sdb1): error count: 2 [1770174.113928] EXT4-fs (sdb1): initial error at 1343085498: ext4_xattr_release_block:496 [1770174.131699] EXT4-fs (sdb1): last error at 1343085498: ext4_xattr_release_block:504 [1856483.884918] EXT4-fs (sdb1): error count: 2 [1856483.894541] EXT4-fs (sdb1): initial error at 1343085498: ext4_xattr_release_block:496 [1856483.914191] EXT4-fs (sdb1): last error at 1343085498: ext4_xattr_release_block:504 [1942793.613632] EXT4-fs (sdb1): error count: 2 [1942793.623403] EXT4-fs (sdb1): initial error at 1343085498: ext4_xattr_release_block:496 [1942793.642674] EXT4-fs (sdb1): last error at 1343085498: ext4_xattr_release_block:504 [2029103.473721] EXT4-fs (sdb1): error count: 2 [2029103.483627] EXT4-fs (sdb1): initial error at 1343085498: ext4_xattr_release_block:496 [2029103.503299] EXT4-fs (sdb1): last error at 1343085498: ext4_xattr_release_block:504 I checked the raid (bultin hw controller from Dell), and all the disks are ok. Next step would be to do a fsck first I guess. But why can such errors occure? Any ideas? Cheers, Christian 2012/7/31 Brian Candler <B.Candler at pobox.com> > On Tue, Jul 31, 2012 at 02:04:25PM +0200, Christian Wittwer wrote: > > b) Can I just restart glusterd on that node to trigger the self > > healing? > > I would double-check that the underlying filesystem on > unic-prd-os-compute4:/data/brick0 is OK first. Look for errors in dmesg; > look at your RAID status (e.g. if it's mdraid then cat /proc/mdstat); > check RAID logs, SMART logs etc. > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://gluster.org/pipermail/gluster-users/attachments/20120731/8cf54afd/attachment.htm>