In the message dated: Mon, 22 Mar 2010 09:52:21 EDT, The pithy ruminations from Bob Peterson on <Re: falure during gfs2_grow caused node crash & data loss> wer e: => ----- bergman@xxxxxxxxxxxx wrote: => | I just had a serious problem with gfs2_grow which caused a loss of => | data and a => | cluster node reboot. => | => | I was attempting to grow a gfs2 volume from 50GB => 145GB. The volume => | was => | mounted on both cluster nodes at the start of running "gfs2_grow". => | When I => | umounted the volume from _one_ node (not where gfs2_grow was running), => | the => | macine running gfs2_grow rebooted and the filesystem is damaged. => | => | The sequence of commands was as follows. Each command was successful => | until the => | "umount". => (snip) => | Mark => => Hi Mark, Thanks for getting back to me. => => There's a good chance this was caused by bugzilla bug #546683 which => is scheduled to be released in 5.5. However, I've also seen some => problems like this when a logical volume in LVM isn't marked as => clustered. Make sure it is with the "vgs" command (check if the flags => end with a "c") and if not, do vgchange -cy <volgrp> Yes, the volume group is clustered (it contains 5 other filesystems, some of which are gfs2 clustered) and works fine. => => As for fsck.gfs2, it should never segfault. IMHO, this is a bug => so please open a bugzilla record: Product: "Red Hat Enterprise Linux 5" Can I paraphrase that when I talk to our developers? I've been trying to convince them that (in most cases) segfault == bug. :) => and component "gfs2-utils". Assign it to me. Will do...once my Bugzilla account is setup. => => As for recovering your volume, you can try this but it's not guaranteed => to work: => (1) Reduce the volume to its size from before the gfs2_grow. That claims to be successful. The 'lvs' command shows the volume at it's previous size. => (2) Mount it from one node only, if you can (it may crash). I'm unable to mount the volume: /sbin/mount.gfs2: error mounting /dev/mapper/global_vg-legacy on /legacy: No such file or directory An fsck.gfs2 at this point reports: Initializing fsck Recovering journals (this may take a while)... Journal recovery complete. Validating Resource Group index. Level 1 RG check. (level 1 failed) Level 2 RG check. L2: number of rgs in the index = 85. WARNING: rindex file is corrupt. (level 2 failed) Level 3 RG check. RG 1 at block 0x11 intact [length 0x3b333] RG 2 at block 0x3B344 intact [length 0x3b32f] RG 3 at block 0x76673 intact [length 0x3b32f] RG 4 at block 0xB19A2 intact [length 0x3b32f] RG 5 at block 0xECCD1 intact [length 0x3b32f] RG 6 at block 0x128000 intact [length 0x3b32f] * RG 7 at block 0x16332F *** DAMAGED *** [length 0x3b32f] * RG 8 at block 0x19E65E *** DAMAGED *** [length 0x3b32f] * RG 9 at block 0x1D998D *** DAMAGED *** [length 0x3b32f] * RG 10 at block 0x214CBC *** DAMAGED *** [length 0x3b32f] Error: too many bad RGs. Error rebuilding rg list. (level 3 failed) RG recovery impossible; I can't fix this file system. => (3) If it lets you mount it, run gfs2_grow again. => (4) Unmount the volume. => (5) Mount the volume from both nodes. => => If that doesn't work or if the system can't properly mount the volume => your choices are either (1) reformat the volume and restore from I figured I'll have to do that...so I'll keep playing with the alternatives first. => backup, (2) Use gfs2_edit to patch the i_size field of the rindex file Do you mean "di_size"? => to be a fairly small multiple of 96 then repeat steps 1 through 4. According to "gfs2_edit -p rindex", the initial value of di_size is: di_size 8192 0x2000 Does that give any indication of an appropriate "fairly small multiple"? Thanks, Mark => => Regards, => => Bob Peterson => Red Hat File Systems => -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster