This is worse than I tought. The entire cluster is hanging upon restart command issued from the Conga - lucy box. I tried bringing gfs service down on node2 (lucy) with the: service gfs stop (we are not running rgmanager), and I got:
FATAL: Module gfs is in use.
On node3:
service gfs status:
Configured GFS mountpoints:
/lvm_test1
/lvm_test2
Active GFS mountpoints:
/lvm_test1
/lvm_test2
service gfs stop:
Unmounting GFS filesystems: (hangs)
node2 - .175
node3 - .78
node4 - .79
All nodes are configured on the same segment.
These are the messages from the node3 once from the point I tried to restart the cluster:
Sep 26 09:00:38 dev03 openais[8692]: [TOTEM] entering GATHER state from 12.
Sep 26 09:00:43 dev03 openais[8692]: [TOTEM] entering GATHER state from 0.
Sep 26 09:00:43 dev03 openais[8692]: [TOTEM] Creating commit token because I am the rep.
Sep 26 09:00:43 dev03 openais[8692]: [TOTEM] Saving state aru 1e1 high seq received 1e1
Sep 26 09:00:43 dev03 openais[8692]: [TOTEM] Storing new sequence id for ring 454
Sep 26 09:00:43 dev03 openais[8692]: [TOTEM] entering COMMIT state.
Sep 26 09:00:43 dev03 openais[8692]: [TOTEM] entering RECOVERY state.
Sep 26 09:00:43 dev03 openais[8692]: [TOTEM] position [0] member xxx.xxx.xxx.78:
Sep 26 09:00:43 dev03 openais[8692]: [TOTEM] previous ring seq 1104 rep xxx.xxx.xxx.78
Sep 26 09:00:43 dev03 openais[8692]: [TOTEM] aru 1e1 high delivered 1e1 received flag 1
Sep 26 09:00:43 dev03 openais[8692]: [TOTEM] position [1] member xxx.xxx.xxx.175:
Sep 26 09:00:43 dev03 openais[8692]: [TOTEM] previous ring seq 1104 rep xxx.xxx.xxx.78
Sep 26 09:00:43 dev03 openais[8692]: [TOTEM] aru 1e1 high delivered 1e1 received flag 1
Sep 26 09:00:43 dev03 openais[8692]: [TOTEM] Did not need to originate any messages in recovery.
Sep 26 09:00:43 dev03 openais[8692]: [TOTEM] Sending initial ORF token
Sep 26 09:00:43 dev03 openais[8692]: [CLM ] CLM CONFIGURATION CHANGE
Sep 26 09:00:43 dev03 kernel: dlm: closing connection to node 2
Sep 26 09:00:43 dev03 openais[8692]: [CLM ] New Configuration:
Sep 26 09:00:43 dev03 openais[8692]: [CLM ] r(0) ip(xxx.xxx.xxx.78)
Sep 26 09:00:43 dev03 openais[8692]: [CLM ] r(0) ip(xxx.xxx.xxx.175)
Sep 26 09:00:43 dev03 openais[8692]: [CLM ] Members Left:
Sep 26 09:00:43 dev03 openais[8692]: [CLM ] r(0) ip(xxx.xxx.xxx.79)
Sep 26 09:00:43 dev03 openais[8692]: [CLM ] Members Joined:
Sep 26 09:00:43 dev03 openais[8692]: [CLM ] CLM CONFIGURATION CHANGE
Sep 26 09:00:43 dev03 openais[8692]: [CLM ] New Configuration:
Sep 26 09:00:43 dev03 openais[8692]: [CLM ] r(0) ip(xxx.xxx.xxx.78)
Sep 26 09:00:43 dev03 openais[8692]: [CLM ] r(0) ip(xxx.xxx.xxx.175)
Sep 26 09:00:43 dev03 openais[8692]: [CLM ] Members Left:
Sep 26 09:00:43 dev03 openais[8692]: [CLM ] Members Joined:
Sep 26 09:00:43 dev03 openais[8692]: [SYNC ] This node is within the primary component and will provide service.
Sep 26 09:00:43 dev03 openais[8692]: [TOTEM] entering OPERATIONAL state.
Sep 26 09:00:43 dev03 openais[8692]: [CLM ] got nodejoin message xxx.xxx.xxx.78
Sep 26 09:00:43 dev03 openais[8692]: [CLM ] got nodejoin message xxx.xxx.xxx.175
Sep 26 09:00:43 dev03 openais[8692]: [CPG ] got joinlist message from node 3
Sep 26 09:00:43 dev03 fenced[8710]: fencing deferred to fenmrdev02.maritz.com
Sep 26 09:00:43 dev03 openais[8692]: [CPG ] got joinlist message from node 1
Sep 26 09:00:45 dev03 kernel: GFS: fsid=test1_cluster:gfs_fs1.2: jid=1: Trying to acquire journal lock...
Sep 26 09:00:45 dev03 kernel: GFS: fsid=test1_cluster:gfs_fs1.2: jid=1: Busy
Sep 26 09:02:37 dev03 openais[8692]: [TOTEM] entering GATHER state from 11.
Sep 26 09:02:37 dev03 openais[8692]: [TOTEM] Creating commit token because I am the rep.
Sep 26 09:02:37 dev03 openais[8692]: [TOTEM] Saving state aru 31 high seq received 31
Sep 26 09:02:37 dev03 openais[8692]: [TOTEM] Storing new sequence id for ring 458
Sep 26 09:02:37 dev03 openais[8692]: [TOTEM] entering COMMIT state.
Sep 26 09:02:37 dev03 openais[8692]: [TOTEM] entering RECOVERY state.
Sep 26 09:02:37 dev03 openais[8692]: [TOTEM] position [0] member xxx.xxx.xxx.78:
Sep 26 09:02:37 dev03 openais[8692]: [TOTEM] previous ring seq 1108 rep xxx.xxx.xxx.78
Sep 26 09:02:37 dev03 openais[8692]: [TOTEM] aru 31 high delivered 31 received flag 1
Sep 26 09:02:37 dev03 openais[8692]: [TOTEM] position [1] member xxx.xxx.xxx.79:
Sep 26 09:02:37 dev03 openais[8692]: [TOTEM] previous ring seq 1108 rep xxx.xxx.xxx.79
Sep 26 09:02:37 dev03 openais[8692]: [TOTEM] aru 9 high delivered 9 received flag 1
Sep 26 09:02:37 dev03 openais[8692]: [TOTEM] position [2] member xxx.xxx.xxx.175:
Sep 26 09:02:37 dev03 openais[8692]: [TOTEM] previous ring seq 1108 rep xxx.xxx.xxx.78
Sep 26 09:02:37 dev03 openais[8692]: [TOTEM] aru 31 high delivered 31 received flag 1
Sep 26 09:02:37 dev03 openais[8692]: [TOTEM] Did not need to originate any messages in recovery.
Sep 26 09:02:37 dev03 openais[8692]: [TOTEM] Sending initial ORF token
Sep 26 09:02:37 dev03 openais[8692]: [CLM ] CLM CONFIGURATION CHANGE
Sep 26 09:02:37 dev03 openais[8692]: [CLM ] New Configuration:
Sep 26 09:02:37 dev03 openais[8692]: [CLM ] r(0) ip(xxx.xxx.xxx.78)
Sep 26 09:02:37 dev03 openais[8692]: [CLM ] r(0) ip(xxx.xxx.xxx.175)
Sep 26 09:02:37 dev03 openais[8692]: [CLM ] Members Left:
Sep 26 09:02:37 dev03 openais[8692]: [CLM ] Members Joined:
Sep 26 09:02:37 dev03 openais[8692]: [CLM ] CLM CONFIGURATION CHANGE
Sep 26 09:02:37 dev03 openais[8692]: [CLM ] New Configuration:
Sep 26 09:02:37 dev03 openais[8692]: [CLM ] r(0) ip(xxx.xxx.xxx.78)
Sep 26 09:02:37 dev03 openais[8692]: [CLM ] r(0) ip(xxx.xxx.xxx.79)
Sep 26 09:02:37 dev03 openais[8692]: [CLM ] r(0) ip(xxx.xxx.xxx.175)
Sep 26 09:02:37 dev03 openais[8692]: [CLM ] Members Left:
Sep 26 09:02:37 dev03 openais[8692]: [CLM ] Members Joined:
Sep 26 09:02:37 dev03 openais[8692]: [CLM ] r(0) ip(xxx.xxx.xxx.79)
Sep 26 09:02:37 dev03 openais[8692]: [SYNC ] This node is within the primary component and will provide service.
Sep 26 09:02:37 dev03 openais[8692]: [TOTEM] entering OPERATIONAL state.
Sep 26 09:02:37 dev03 openais[8692]: [CLM ] got nodejoin message xxx.xxx.xxx.78
Sep 26 09:02:37 dev03 openais[8692]: [CLM ] got nodejoin message xxx.xxx.xxx.79
Sep 26 09:02:37 dev03 openais[8692]: [CLM ] got nodejoin message xxx.xxx.xxx.175
Sep 26 09:02:37 dev03 openais[8692]: [CPG ] got joinlist message from node 3
Sep 26 09:02:37 dev03 openais[8692]: [CPG ] got joinlist message from node 1
Sep 26 09:02:43 dev03 kernel: dlm: connecting to 2
--
Alan A.
FATAL: Module gfs is in use.
On node3:
service gfs status:
Configured GFS mountpoints:
/lvm_test1
/lvm_test2
Active GFS mountpoints:
/lvm_test1
/lvm_test2
service gfs stop:
Unmounting GFS filesystems: (hangs)
node2 - .175
node3 - .78
node4 - .79
All nodes are configured on the same segment.
These are the messages from the node3 once from the point I tried to restart the cluster:
Sep 26 09:00:38 dev03 openais[8692]: [TOTEM] entering GATHER state from 12.
Sep 26 09:00:43 dev03 openais[8692]: [TOTEM] entering GATHER state from 0.
Sep 26 09:00:43 dev03 openais[8692]: [TOTEM] Creating commit token because I am the rep.
Sep 26 09:00:43 dev03 openais[8692]: [TOTEM] Saving state aru 1e1 high seq received 1e1
Sep 26 09:00:43 dev03 openais[8692]: [TOTEM] Storing new sequence id for ring 454
Sep 26 09:00:43 dev03 openais[8692]: [TOTEM] entering COMMIT state.
Sep 26 09:00:43 dev03 openais[8692]: [TOTEM] entering RECOVERY state.
Sep 26 09:00:43 dev03 openais[8692]: [TOTEM] position [0] member xxx.xxx.xxx.78:
Sep 26 09:00:43 dev03 openais[8692]: [TOTEM] previous ring seq 1104 rep xxx.xxx.xxx.78
Sep 26 09:00:43 dev03 openais[8692]: [TOTEM] aru 1e1 high delivered 1e1 received flag 1
Sep 26 09:00:43 dev03 openais[8692]: [TOTEM] position [1] member xxx.xxx.xxx.175:
Sep 26 09:00:43 dev03 openais[8692]: [TOTEM] previous ring seq 1104 rep xxx.xxx.xxx.78
Sep 26 09:00:43 dev03 openais[8692]: [TOTEM] aru 1e1 high delivered 1e1 received flag 1
Sep 26 09:00:43 dev03 openais[8692]: [TOTEM] Did not need to originate any messages in recovery.
Sep 26 09:00:43 dev03 openais[8692]: [TOTEM] Sending initial ORF token
Sep 26 09:00:43 dev03 openais[8692]: [CLM ] CLM CONFIGURATION CHANGE
Sep 26 09:00:43 dev03 kernel: dlm: closing connection to node 2
Sep 26 09:00:43 dev03 openais[8692]: [CLM ] New Configuration:
Sep 26 09:00:43 dev03 openais[8692]: [CLM ] r(0) ip(xxx.xxx.xxx.78)
Sep 26 09:00:43 dev03 openais[8692]: [CLM ] r(0) ip(xxx.xxx.xxx.175)
Sep 26 09:00:43 dev03 openais[8692]: [CLM ] Members Left:
Sep 26 09:00:43 dev03 openais[8692]: [CLM ] r(0) ip(xxx.xxx.xxx.79)
Sep 26 09:00:43 dev03 openais[8692]: [CLM ] Members Joined:
Sep 26 09:00:43 dev03 openais[8692]: [CLM ] CLM CONFIGURATION CHANGE
Sep 26 09:00:43 dev03 openais[8692]: [CLM ] New Configuration:
Sep 26 09:00:43 dev03 openais[8692]: [CLM ] r(0) ip(xxx.xxx.xxx.78)
Sep 26 09:00:43 dev03 openais[8692]: [CLM ] r(0) ip(xxx.xxx.xxx.175)
Sep 26 09:00:43 dev03 openais[8692]: [CLM ] Members Left:
Sep 26 09:00:43 dev03 openais[8692]: [CLM ] Members Joined:
Sep 26 09:00:43 dev03 openais[8692]: [SYNC ] This node is within the primary component and will provide service.
Sep 26 09:00:43 dev03 openais[8692]: [TOTEM] entering OPERATIONAL state.
Sep 26 09:00:43 dev03 openais[8692]: [CLM ] got nodejoin message xxx.xxx.xxx.78
Sep 26 09:00:43 dev03 openais[8692]: [CLM ] got nodejoin message xxx.xxx.xxx.175
Sep 26 09:00:43 dev03 openais[8692]: [CPG ] got joinlist message from node 3
Sep 26 09:00:43 dev03 fenced[8710]: fencing deferred to fenmrdev02.maritz.com
Sep 26 09:00:43 dev03 openais[8692]: [CPG ] got joinlist message from node 1
Sep 26 09:00:45 dev03 kernel: GFS: fsid=test1_cluster:gfs_fs1.2: jid=1: Trying to acquire journal lock...
Sep 26 09:00:45 dev03 kernel: GFS: fsid=test1_cluster:gfs_fs1.2: jid=1: Busy
Sep 26 09:02:37 dev03 openais[8692]: [TOTEM] entering GATHER state from 11.
Sep 26 09:02:37 dev03 openais[8692]: [TOTEM] Creating commit token because I am the rep.
Sep 26 09:02:37 dev03 openais[8692]: [TOTEM] Saving state aru 31 high seq received 31
Sep 26 09:02:37 dev03 openais[8692]: [TOTEM] Storing new sequence id for ring 458
Sep 26 09:02:37 dev03 openais[8692]: [TOTEM] entering COMMIT state.
Sep 26 09:02:37 dev03 openais[8692]: [TOTEM] entering RECOVERY state.
Sep 26 09:02:37 dev03 openais[8692]: [TOTEM] position [0] member xxx.xxx.xxx.78:
Sep 26 09:02:37 dev03 openais[8692]: [TOTEM] previous ring seq 1108 rep xxx.xxx.xxx.78
Sep 26 09:02:37 dev03 openais[8692]: [TOTEM] aru 31 high delivered 31 received flag 1
Sep 26 09:02:37 dev03 openais[8692]: [TOTEM] position [1] member xxx.xxx.xxx.79:
Sep 26 09:02:37 dev03 openais[8692]: [TOTEM] previous ring seq 1108 rep xxx.xxx.xxx.79
Sep 26 09:02:37 dev03 openais[8692]: [TOTEM] aru 9 high delivered 9 received flag 1
Sep 26 09:02:37 dev03 openais[8692]: [TOTEM] position [2] member xxx.xxx.xxx.175:
Sep 26 09:02:37 dev03 openais[8692]: [TOTEM] previous ring seq 1108 rep xxx.xxx.xxx.78
Sep 26 09:02:37 dev03 openais[8692]: [TOTEM] aru 31 high delivered 31 received flag 1
Sep 26 09:02:37 dev03 openais[8692]: [TOTEM] Did not need to originate any messages in recovery.
Sep 26 09:02:37 dev03 openais[8692]: [TOTEM] Sending initial ORF token
Sep 26 09:02:37 dev03 openais[8692]: [CLM ] CLM CONFIGURATION CHANGE
Sep 26 09:02:37 dev03 openais[8692]: [CLM ] New Configuration:
Sep 26 09:02:37 dev03 openais[8692]: [CLM ] r(0) ip(xxx.xxx.xxx.78)
Sep 26 09:02:37 dev03 openais[8692]: [CLM ] r(0) ip(xxx.xxx.xxx.175)
Sep 26 09:02:37 dev03 openais[8692]: [CLM ] Members Left:
Sep 26 09:02:37 dev03 openais[8692]: [CLM ] Members Joined:
Sep 26 09:02:37 dev03 openais[8692]: [CLM ] CLM CONFIGURATION CHANGE
Sep 26 09:02:37 dev03 openais[8692]: [CLM ] New Configuration:
Sep 26 09:02:37 dev03 openais[8692]: [CLM ] r(0) ip(xxx.xxx.xxx.78)
Sep 26 09:02:37 dev03 openais[8692]: [CLM ] r(0) ip(xxx.xxx.xxx.79)
Sep 26 09:02:37 dev03 openais[8692]: [CLM ] r(0) ip(xxx.xxx.xxx.175)
Sep 26 09:02:37 dev03 openais[8692]: [CLM ] Members Left:
Sep 26 09:02:37 dev03 openais[8692]: [CLM ] Members Joined:
Sep 26 09:02:37 dev03 openais[8692]: [CLM ] r(0) ip(xxx.xxx.xxx.79)
Sep 26 09:02:37 dev03 openais[8692]: [SYNC ] This node is within the primary component and will provide service.
Sep 26 09:02:37 dev03 openais[8692]: [TOTEM] entering OPERATIONAL state.
Sep 26 09:02:37 dev03 openais[8692]: [CLM ] got nodejoin message xxx.xxx.xxx.78
Sep 26 09:02:37 dev03 openais[8692]: [CLM ] got nodejoin message xxx.xxx.xxx.79
Sep 26 09:02:37 dev03 openais[8692]: [CLM ] got nodejoin message xxx.xxx.xxx.175
Sep 26 09:02:37 dev03 openais[8692]: [CPG ] got joinlist message from node 3
Sep 26 09:02:37 dev03 openais[8692]: [CPG ] got joinlist message from node 1
Sep 26 09:02:43 dev03 kernel: dlm: connecting to 2
---------- Forwarded message ----------
From: Alan A <alan.zg@xxxxxxxxx>
Date: Thu, Sep 25, 2008 at 2:04 PM
Subject: GFS volume hangs on 3 nodes after gfs_grow
To: linux clustering <linux-cluster@xxxxxxxxxx>
From: Alan A <alan.zg@xxxxxxxxx>
Date: Thu, Sep 25, 2008 at 2:04 PM
Subject: GFS volume hangs on 3 nodes after gfs_grow
To: linux clustering <linux-cluster@xxxxxxxxxx>
Hi all!
I have 3 node test cluster utilizing SCSI fencing and GFS. I have made 2 GFS Logical Volumes - lvm1 and lvm2, both utilizing 5GB on 10GB disks. Testing the command line tools I did lvextend -L +1G /devicename to bring lvm2 to 6GB. This went fine without any problems. Then I issued command gfs_grow /mountpoint and the volume became inaccessible. Any command trying to access the volume hangs, and umount returns: /sbin/umount.gfs: /lvm2: device is busy.
Few questions - Since I have two volumes on this cluster and the lvm1 works just fine, would there be any suggestions to unmounting lvm2 in order to try and fix it?
Is gfs_grow - bug free or not (use/do not use)?
Is there any other way besides restarting the cluster/ nodes to get lvm2 back in operational state?
--
Alan A.
I have 3 node test cluster utilizing SCSI fencing and GFS. I have made 2 GFS Logical Volumes - lvm1 and lvm2, both utilizing 5GB on 10GB disks. Testing the command line tools I did lvextend -L +1G /devicename to bring lvm2 to 6GB. This went fine without any problems. Then I issued command gfs_grow /mountpoint and the volume became inaccessible. Any command trying to access the volume hangs, and umount returns: /sbin/umount.gfs: /lvm2: device is busy.
Few questions - Since I have two volumes on this cluster and the lvm1 works just fine, would there be any suggestions to unmounting lvm2 in order to try and fix it?
Is gfs_grow - bug free or not (use/do not use)?
Is there any other way besides restarting the cluster/ nodes to get lvm2 back in operational state?
--
Alan A.
--
Alan A.
-- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster