Fwd: GFS volume hangs on 3 nodes after gfs_grow

"Alan A" <alan.zg@xxxxxxxxx> · Fri, 26 Sep 2008 09:43:39 -0500

This is worse than I tought. The entire cluster is hanging upon restart command issued from the Conga - lucy box. I tried bringing gfs service down on node2 (lucy) with the: service gfs stop (we are not running rgmanager), and I got:

FATAL: Module gfs is in use.

On node3:
service gfs status:
Configured GFS mountpoints: 
/lvm_test1
/lvm_test2
Active GFS mountpoints: 
/lvm_test1
/lvm_test2

service gfs stop:
Unmounting GFS filesystems:  (hangs)

node2 - .175
node3 - .78
node4 - .79
All nodes are configured on the same segment.

These are the messages from the node3 once from the point I tried to restart the cluster:
Sep 26 09:00:38 dev03 openais[8692]: [TOTEM] entering GATHER state from 12. 

Sep 26 09:00:43 dev03 openais[8692]: [TOTEM] entering GATHER state from 0. 
Sep 26 09:00:43 dev03 openais[8692]: [TOTEM] Creating commit token because I am the rep. 
Sep 26 09:00:43 dev03 openais[8692]: [TOTEM] Saving state aru 1e1 high seq received 1e1 

Sep 26 09:00:43 dev03 openais[8692]: [TOTEM] Storing new sequence id for ring 454 
Sep 26 09:00:43 dev03 openais[8692]: [TOTEM] entering COMMIT state. 
Sep 26 09:00:43 dev03 openais[8692]: [TOTEM] entering RECOVERY state. 

Sep 26 09:00:43 dev03 openais[8692]: [TOTEM] position [0] member xxx.xxx.xxx.78: 
Sep 26 09:00:43 dev03 openais[8692]: [TOTEM] previous ring seq 1104 rep xxx.xxx.xxx.78 
Sep 26 09:00:43 dev03 openais[8692]: [TOTEM] aru 1e1 high delivered 1e1 received flag 1 

Sep 26 09:00:43 dev03 openais[8692]: [TOTEM] position [1] member xxx.xxx.xxx.175: 
Sep 26 09:00:43 dev03 openais[8692]: [TOTEM] previous ring seq 1104 rep xxx.xxx.xxx.78 
Sep 26 09:00:43 dev03 openais[8692]: [TOTEM] aru 1e1 high delivered 1e1 received flag 1 

Sep 26 09:00:43 dev03 openais[8692]: [TOTEM] Did not need to originate any messages in recovery. 
Sep 26 09:00:43 dev03 openais[8692]: [TOTEM] Sending initial ORF token 
Sep 26 09:00:43 dev03 openais[8692]: [CLM  ] CLM CONFIGURATION CHANGE 

Sep 26 09:00:43 dev03 kernel: dlm: closing connection to node 2
Sep 26 09:00:43 dev03 openais[8692]: [CLM  ] New Configuration: 
Sep 26 09:00:43 dev03 openais[8692]: [CLM  ]       r(0) ip(xxx.xxx.xxx.78)  
Sep 26 09:00:43 dev03 openais[8692]: [CLM  ]       r(0) ip(xxx.xxx.xxx.175)  

Sep 26 09:00:43 dev03 openais[8692]: [CLM  ] Members Left: 
Sep 26 09:00:43 dev03 openais[8692]: [CLM  ]       r(0) ip(xxx.xxx.xxx.79)  
Sep 26 09:00:43 dev03 openais[8692]: [CLM  ] Members Joined: 
Sep 26 09:00:43 dev03 openais[8692]: [CLM  ] CLM CONFIGURATION CHANGE 

Sep 26 09:00:43 dev03 openais[8692]: [CLM  ] New Configuration: 
Sep 26 09:00:43 dev03 openais[8692]: [CLM  ]       r(0) ip(xxx.xxx.xxx.78)  
Sep 26 09:00:43 dev03 openais[8692]: [CLM  ]       r(0) ip(xxx.xxx.xxx.175)  

Sep 26 09:00:43 dev03 openais[8692]: [CLM  ] Members Left: 
Sep 26 09:00:43 dev03 openais[8692]: [CLM  ] Members Joined: 
Sep 26 09:00:43 dev03 openais[8692]: [SYNC ] This node is within the primary component and will provide service. 

Sep 26 09:00:43 dev03 openais[8692]: [TOTEM] entering OPERATIONAL state. 
Sep 26 09:00:43 dev03 openais[8692]: [CLM  ] got nodejoin message xxx.xxx.xxx.78 
Sep 26 09:00:43 dev03 openais[8692]: [CLM  ] got nodejoin message xxx.xxx.xxx.175 

Sep 26 09:00:43 dev03 openais[8692]: [CPG  ] got joinlist message from node 3 
Sep 26 09:00:43 dev03 fenced[8710]: fencing deferred to fenmrdev02.maritz.com
Sep 26 09:00:43 dev03 openais[8692]: [CPG  ] got joinlist message from node 1 

Sep 26 09:00:45 dev03 kernel: GFS: fsid=test1_cluster:gfs_fs1.2: jid=1: Trying to acquire journal lock...
Sep 26 09:00:45 dev03 kernel: GFS: fsid=test1_cluster:gfs_fs1.2: jid=1: Busy
Sep 26 09:02:37 dev03 openais[8692]: [TOTEM] entering GATHER state from 11. 

Sep 26 09:02:37 dev03 openais[8692]: [TOTEM] Creating commit token because I am the rep. 
Sep 26 09:02:37 dev03 openais[8692]: [TOTEM] Saving state aru 31 high seq received 31 
Sep 26 09:02:37 dev03 openais[8692]: [TOTEM] Storing new sequence id for ring 458 

Sep 26 09:02:37 dev03 openais[8692]: [TOTEM] entering COMMIT state. 
Sep 26 09:02:37 dev03 openais[8692]: [TOTEM] entering RECOVERY state. 
Sep 26 09:02:37 dev03 openais[8692]: [TOTEM] position [0] member xxx.xxx.xxx.78: 

Sep 26 09:02:37 dev03 openais[8692]: [TOTEM] previous ring seq 1108 rep xxx.xxx.xxx.78 
Sep 26 09:02:37 dev03 openais[8692]: [TOTEM] aru 31 high delivered 31 received flag 1 
Sep 26 09:02:37 dev03 openais[8692]: [TOTEM] position [1] member xxx.xxx.xxx.79: 

Sep 26 09:02:37 dev03 openais[8692]: [TOTEM] previous ring seq 1108 rep xxx.xxx.xxx.79 
Sep 26 09:02:37 dev03 openais[8692]: [TOTEM] aru 9 high delivered 9 received flag 1 
Sep 26 09:02:37 dev03 openais[8692]: [TOTEM] position [2] member xxx.xxx.xxx.175: 

Sep 26 09:02:37 dev03 openais[8692]: [TOTEM] previous ring seq 1108 rep xxx.xxx.xxx.78 
Sep 26 09:02:37 dev03 openais[8692]: [TOTEM] aru 31 high delivered 31 received flag 1 
Sep 26 09:02:37 dev03 openais[8692]: [TOTEM] Did not need to originate any messages in recovery. 

Sep 26 09:02:37 dev03 openais[8692]: [TOTEM] Sending initial ORF token 
Sep 26 09:02:37 dev03 openais[8692]: [CLM  ] CLM CONFIGURATION CHANGE 
Sep 26 09:02:37 dev03 openais[8692]: [CLM  ] New Configuration: 
Sep 26 09:02:37 dev03 openais[8692]: [CLM  ]       r(0) ip(xxx.xxx.xxx.78)  

Sep 26 09:02:37 dev03 openais[8692]: [CLM  ]       r(0) ip(xxx.xxx.xxx.175)  
Sep 26 09:02:37 dev03 openais[8692]: [CLM  ] Members Left: 
Sep 26 09:02:37 dev03 openais[8692]: [CLM  ] Members Joined: 
Sep 26 09:02:37 dev03 openais[8692]: [CLM  ] CLM CONFIGURATION CHANGE 

Sep 26 09:02:37 dev03 openais[8692]: [CLM  ] New Configuration: 
Sep 26 09:02:37 dev03 openais[8692]: [CLM  ]       r(0) ip(xxx.xxx.xxx.78)  
Sep 26 09:02:37 dev03 openais[8692]: [CLM  ]       r(0) ip(xxx.xxx.xxx.79)  

Sep 26 09:02:37 dev03 openais[8692]: [CLM  ]       r(0) ip(xxx.xxx.xxx.175)  
Sep 26 09:02:37 dev03 openais[8692]: [CLM  ] Members Left: 
Sep 26 09:02:37 dev03 openais[8692]: [CLM  ] Members Joined: 
Sep 26 09:02:37 dev03 openais[8692]: [CLM  ]       r(0) ip(xxx.xxx.xxx.79)  

Sep 26 09:02:37 dev03 openais[8692]: [SYNC ] This node is within the primary component and will provide service. 
Sep 26 09:02:37 dev03 openais[8692]: [TOTEM] entering OPERATIONAL state. 
Sep 26 09:02:37 dev03 openais[8692]: [CLM  ] got nodejoin message xxx.xxx.xxx.78 

Sep 26 09:02:37 dev03 openais[8692]: [CLM  ] got nodejoin message xxx.xxx.xxx.79 
Sep 26 09:02:37 dev03 openais[8692]: [CLM  ] got nodejoin message xxx.xxx.xxx.175 
Sep 26 09:02:37 dev03 openais[8692]: [CPG  ] got joinlist message from node 3 

Sep 26 09:02:37 dev03 openais[8692]: [CPG  ] got joinlist message from node 1 
Sep 26 09:02:43 dev03 kernel: dlm: connecting to 2

---------- Forwarded message ----------
From: Alan A <alan.zg@xxxxxxxxx>

Date: Thu, Sep 25, 2008 at 2:04 PM
Subject: GFS volume hangs on 3 nodes after gfs_grow
To: linux clustering <linux-cluster@xxxxxxxxxx>

Hi all!

I have 3 node test cluster utilizing SCSI fencing and GFS. I have made 2 GFS Logical Volumes - lvm1 and lvm2, both utilizing 5GB on 10GB disks. Testing the command line tools I did lvextend -L +1G /devicename to bring lvm2 to 6GB. This went fine without any problems. Then I issued command gfs_grow /mountpoint and the volume became inaccessible. Any command trying to access the volume hangs, and umount returns: /sbin/umount.gfs: /lvm2: device is busy.

Few questions - Since I have two volumes on this cluster and the lvm1 works just fine, would there be any suggestions to unmounting lvm2 in order to try and fix it?
Is gfs_grow - bug free or not (use/do not use)?

Is there any other way besides restarting the cluster/ nodes to get lvm2 back in operational state?
-- 
Alan A.

-- 
Alan A.

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster