Daniel Dehennin <daniel.dehennin@xxxxxxxxxxxx> writes: > Thanks, I'm using 3.1.6. > > Tonight I'll build the version 3.1.8 from Git[1] and run “fsck.gfs2 -p” on the fs. Hello, I preferred to do the fsck on the filesystem, two times[1], instead of the “gfs2_edit savemeta”: 1. “fsck.gfs2 -p <BLOCK DEVICE>” was quick 2. “fsck.gfs2 -f -p <BLOCK DEVICE>” took 4 hours The cluster was bringed up after and everything was working fine until yesterday: Feb 18 19:13:22 nebula3 kernel: [293848.682606] GFS2: buf_blk = 0x2089 old_state=0, new_state=0 Feb 18 19:13:22 nebula3 kernel: [293848.682612] GFS2: rgrp=0xc0c5667 bi_start=0x0 Feb 18 19:13:22 nebula3 kernel: [293848.682614] GFS2: bi_offset=0x80 bi_len=0xf80 Feb 18 19:13:22 nebula3 kernel: [293848.682619] CPU: 6 PID: 7057 Comm: kworker/6:8 Tainted: G W 3.13.0-78-generic #122-Ubuntu Feb 18 19:13:22 nebula3 kernel: [293848.682621] Hardware name: Dell Inc. PowerEdge M620/0T36VK, BIOS 2.2.7 01/21/2014 Feb 18 19:13:22 nebula3 kernel: [293848.682637] Workqueue: delete_workqueue delete_work_func [gfs2] Feb 18 19:13:22 nebula3 kernel: [293848.682640] 000000000c0c7705 ffff8811256c59d8 ffffffff81725768 000000000c0c76f6 Feb 18 19:13:22 nebula3 kernel: [293848.682648] ffff8811256c5a30 ffffffffa05bebbf ffff880f5ffe9200 00000000a05c5977 Feb 18 19:13:22 nebula3 kernel: [293848.682653] ffff880f1ee574c8 0000000000002089 ffff882e8c622000 0000000000000010 Feb 18 19:13:22 nebula3 kernel: [293848.682658] Call Trace: Feb 18 19:13:22 nebula3 kernel: [293848.682668] [<ffffffff81725768>] dump_stack+0x45/0x56 Feb 18 19:13:22 nebula3 kernel: [293848.682681] [<ffffffffa05bebbf>] rgblk_free+0x1ff/0x230 [gfs2] Feb 18 19:13:22 nebula3 kernel: [293848.682693] [<ffffffffa05c0f34>] __gfs2_free_blocks+0x34/0x120 [gfs2] Feb 18 19:13:22 nebula3 kernel: [293848.682700] [<ffffffffa059d076>] recursive_scan+0x5b6/0x6a0 [gfs2] Feb 18 19:13:22 nebula3 kernel: [293848.682707] [<ffffffffa059cf2c>] recursive_scan+0x46c/0x6a0 [gfs2] Feb 18 19:13:22 nebula3 kernel: [293848.682714] [<ffffffff8133a0f1>] ? submit_bio+0x71/0x150 Feb 18 19:13:22 nebula3 kernel: [293848.682720] [<ffffffff811f6146>] ? bio_alloc_bioset+0x196/0x2a0 Feb 18 19:13:22 nebula3 kernel: [293848.682727] [<ffffffff811f11d0>] ? _submit_bh+0x150/0x200 Feb 18 19:13:22 nebula3 kernel: [293848.682734] [<ffffffffa059cf2c>] recursive_scan+0x46c/0x6a0 [gfs2] Feb 18 19:13:22 nebula3 kernel: [293848.682744] [<ffffffffa05bb4f5>] ? gfs2_quota_hold+0x175/0x1f0 [gfs2] Feb 18 19:13:22 nebula3 kernel: [293848.682752] [<ffffffffa059d25a>] trunc_dealloc+0xfa/0x120 [gfs2] Feb 18 19:13:22 nebula3 kernel: [293848.682760] [<ffffffffa05a898e>] ? gfs2_glock_wait+0x3e/0x80 [gfs2] Feb 18 19:13:22 nebula3 kernel: [293848.682769] [<ffffffffa05aa190>] ? gfs2_glock_nq+0x280/0x430 [gfs2] Feb 18 19:13:22 nebula3 kernel: [293848.682777] [<ffffffffa059eef0>] gfs2_file_dealloc+0x10/0x20 [gfs2] Feb 18 19:13:22 nebula3 kernel: [293848.682787] [<ffffffffa05c1db3>] gfs2_evict_inode+0x2b3/0x3e0 [gfs2] Feb 18 19:13:22 nebula3 kernel: [293848.682796] [<ffffffffa05c1c13>] ? gfs2_evict_inode+0x113/0x3e0 [gfs2] Feb 18 19:13:22 nebula3 kernel: [293848.682802] [<ffffffff811d9a40>] evict+0xb0/0x1b0 Feb 18 19:13:22 nebula3 kernel: [293848.682807] [<ffffffff811da255>] iput+0xf5/0x180 Feb 18 19:13:22 nebula3 kernel: [293848.682815] [<ffffffffa05a90ec>] delete_work_func+0x5c/0x90 [gfs2] Feb 18 19:13:22 nebula3 kernel: [293848.682822] [<ffffffff81083cd2>] process_one_work+0x182/0x450 Feb 18 19:13:22 nebula3 kernel: [293848.682827] [<ffffffff81084ac1>] worker_thread+0x121/0x410 Feb 18 19:13:22 nebula3 kernel: [293848.682832] [<ffffffff810849a0>] ? rescuer_thread+0x430/0x430 Feb 18 19:13:22 nebula3 kernel: [293848.682837] [<ffffffff8108b8a2>] kthread+0xd2/0xf0 Feb 18 19:13:22 nebula3 kernel: [293848.682841] [<ffffffff8108b7d0>] ? kthread_create_on_node+0x1c0/0x1c0 Feb 18 19:13:22 nebula3 kernel: [293848.682846] [<ffffffff817362a8>] ret_from_fork+0x58/0x90 Feb 18 19:13:22 nebula3 kernel: [293848.682850] [<ffffffff8108b7d0>] ? kthread_create_on_node+0x1c0/0x1c0 Feb 18 19:13:22 nebula3 kernel: [293848.682855] GFS2: fsid=yggdrasil:datastores.1: fatal: filesystem consistency error Feb 18 19:13:22 nebula3 kernel: [293848.682855] GFS2: fsid=yggdrasil:datastores.1: RG = 202135143 Feb 18 19:13:22 nebula3 kernel: [293848.682855] GFS2: fsid=yggdrasil:datastores.1: function = gfs2_setbit, file = /build/linux-OTIHGI/linux-3.13.0/fs/gfs2/rgrp.c, line = 103 Feb 18 19:13:22 nebula3 kernel: [293848.682859] GFS2: fsid=yggdrasil:datastores.1: about to withdraw this file system Feb 18 19:13:22 nebula3 kernel: [293848.699050] GFS2: fsid=yggdrasil:datastores.1: dirty_inode: glock -5 Feb 18 19:13:22 nebula3 kernel: [293848.705401] GFS2: fsid=yggdrasil:datastores.1: dirty_inode: glock -5 Now, the “always faulty node” is down and I'm doing the “gfs2_edit savemeta” from the other node. I'm wondering if I should not upgrade the kernels to a much newer version than 3.13.0. My Ubuntu Trusty has some proposed kernel up to 4.2.0. Regards. Footnotes: [1] The logs are attached to this email -- Daniel Dehennin Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF Fingerprint: 3E69 014E 5C23 50E8 9ED6 2AAD CC1E 9E5B 7A6F E2DF -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster