My responses inline: > hi, > > our gfs2 datasets are down; when i try to do a mount i get: > > [root@DBT1 ~]# mount -a > /sbin/mount.gfs2: node not a member of the default fence domain > /sbin/mount.gfs2: error mounting lockproto lock_dlm > /sbin/mount.gfs2: node not a member of the default fence domain > /sbin/mount.gfs2: error mounting lockproto lock_dlm > /sbin/mount.gfs2: node not a member of the default fence domain > /sbin/mount.gfs2: error mounting lockproto lock_dlm > /sbin/mount.gfs2: node not a member of the default fence domain > /sbin/mount.gfs2: error mounting lockproto lock_dlm > /sbin/mount.gfs2: node not a member of the default fence domain > /sbin/mount.gfs2: error mounting lockproto lock_dlm > /sbin/mount.gfs2: node not a member of the default fence domain > /sbin/mount.gfs2: error mounting lockproto lock_dlm This makes me think the node trying to mount your GFS FS is not currently a member of the cluster. Check cman_tool services on all nodes, everything should be in the state NONE. If it is not then there is prolly a membership issue. > > our cluster.conf is consistent across all devices (listed below). > > so i thought an fsck would fix this, then i get: > > [root@DBT1 ~]# fsck.gfs2 -fnp /dev/NEWvg/NEWlvTemp > (snippage) > RG #4909212 (0x4ae89c) free count inconsistent: is 16846 should be > 17157 > Resource group counts updated > Unlinked block 8639983 (0x83d5ef) bitmap fixed. > RG #8639976 (0x83d5e8) free count inconsistent: is 65411 should be > 65412 > Inode count inconsistent: is 20 should be 19 > Resource group counts updated > Pass5 complete > The statfs file is wrong: > > Current statfs values: > blocks: 43324224 (0x2951340) > free: 38433917 (0x24a747d) > dinodes: 21085 (0x525d) > > Calculated statfs values: > blocks: 43324224 (0x2951340) > free: 38466752 (0x24af4c0) > dinodes: 21083 (0x525b) > The statfs file was fixed. > > gfs2_fsck: bad write: Bad file descriptor on line 44 of file buf.c > > i read in https://bugzilla.redhat.com/show_bug.cgi?id=457557 that > there > is some way of fixing this with gfs2_edit - are there docs available? There is a development version of fsck that I have had success fixing several issue with. It can be found at: http://people.redhat.com/rpeterso/Experimental/RHEL5.x/gfs2/ I can't comment on the gfs2_edit procedure, maybe someone else on the list can comment here if that is a better idea than the experimental gfs2 fsck. > > as we've been having fencing issues, i removed two servers (DBT2/DBT3) > from the cluster fencing, and they are not active at this time. would > this cause the mount issues? I see you removed the fence devices from: <clusternode name="DBT3" nodeid="5" votes="1"> <fence> <method name="1"/> </fence> If there was a fence event on this node I could see that as a cause for not being able to mount GFS. Any time there is lost heartbeat all cluster resources will remain frozen until there is a successful fence, without a fence device you should see failed fence messages all through the logs. > tia for any advice / guidance. > > yvette > > our cluster.conf: > > <?xml version="1.0"?> > <cluster alias="DBT0_DBT1_HA" config_version="85" name="DBT0_DBT1_HA"> > <fence_daemon clean_start="0" post_fail_delay="0" > post_join_delay="1"/> > <clusternodes> > <clusternode name="DBT0" nodeid="1" votes="3"> > <fence> > <method name="1"> > <device name="DBT0_ILO2"/> > </method> > </fence> > </clusternode> > <clusternode name="DBT1" nodeid="2" votes="3"> > <fence> > <method name="1"> > <device name="DBT1_ILO2"/> > </method> > </fence> > </clusternode> > <clusternode name="DEV" nodeid="3" votes="3"> > <fence> > <method name="1"> > <device name="DEV_ILO2"/> > </method> > </fence> > </clusternode> > <clusternode name="DBT2" nodeid="4" votes="1"> > <fence> > <method name="1"/> > </fence> > </clusternode> > <clusternode name="DBT3" nodeid="5" votes="1"> > <fence> > <method name="1"/> > </fence> > </clusternode> > </clusternodes> > <cman/> > <fencedevices> > <fencedevice agent="fence_ilo" hostname="192.168.200.140" login="foo" > name="DBT0_ILO2" passwd="foo"/> > <fencedevice agent="fence_ilo" hostname="192.168.200.150" login="foo" > name="DEV_ILO2" passwd="foo"/> > <fencedevice agent="fence_ilo" hostname="192.168.200.141" login="foo" > name="DBT1_ILO2" passwd="foo"/> > </fencedevices> > <rm> > <failoverdomains/> > <resources> > <clusterfs device="/dev/foo0vg/foo0vol002" force_unmount="1" > fsid="19150" fstype="gfs2" mountpoint="/foo0vol002" name="foo0vol002" > options="data=writeback" self_fence="0"/> > <clusterfs device="/dev/foo0vg/foo0lvvol003" force_unmount="1" > fsid="51633" fstype="gfs2" mountpoint="/foo0vol003" name="foo0vol003" > options="data=writeback" self_fence="0"/> > <clusterfs device="/dev/foo0vg/foo0lvvol004" force_unmount="1" > fsid="36294" fstype="gfs2" mountpoint="/foo0vol004" name="foo0vol004" > options="data=writeback" self_fence="0"/> > <clusterfs device="/dev/foo0vg/foo0vol005" force_unmount="1" > fsid="48920" fstype="gfs2" mountpoint="/foo0vol005" name="foo0vol005" > options="noatime,noquota,data=writeback" self_fence="0"/> > <clusterfs device="/dev/foo1vg/foo1lvvol000" force_unmount="1" > fsid="24235" fstype="gfs2" mountpoint="/foo0vol000" name="foo0vol000" > options="data=ordered" self_fence="0"/> > <clusterfs device="/dev/foo1vg/foo1lvvol001" force_unmount="1" > fsid="34088" fstype="gfs2" mountpoint="/foo0vol001" name="foo0vol001" > options="data=ordered" self_fence="0"/> > </resources> > </rm> > <totem consensus="4800" join="60" token="10000" > token_retransmits_before_loss_const="20"/> > <dlm plock_ownership="1" plock_rate_limit="0"/> > <gfs_controld plock_rate_limit="0"/> > </cluster> > > -- > Linux-cluster mailing list > Linux-cluster@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster