rhel4u7 gfs locking up - unable to obtain cluster lock

"Simmons, Dan A" <jds@xxxxxxxxxx> · Thu, 26 Mar 2009 18:00:19 -0400

Hi All,

I have a production Redhat 4u7 GFS cluster that has locked up 5 times in the
last week.  The cluster consists of 12 nodes.  3 of the nodes run Oracle RAC
and the rest run home grown applications.  The system has heavy read/write to
the shared gfs disks.  The symptoms seem similar to those described in
bugzilla 247766 -- my cluster locks up and I am unable to do anything except
reboot the entire cluster.  Prior to the system locking up I get an error in
/var/log/messages "unable to obtain cluster lock: connection timed out" on
one of the nodes but nothing else appears in the logs.   There are 4 gfs
volumes.  The current stats from the busiest volume are:

              			    locks 68763
                             locks held 33981
                          incore inodes 33778
                       metadata buffers 210
                        unlinked inodes 0
                              quota IDs 5
                     incore log buffers 0
                         log space used 0.34%
              meta header cache entries 0
                     glock dependencies 0
                 glocks on reclaim list 0
                              log wraps 17
                   outstanding LM calls 0
                  outstanding BIO calls 0
                       fh2dentry misses 0
                       glocks reclaimed 41083300
                         glock nq calls 39290298
                         glock dq calls 26025821
                   glock prefetch calls 34071947
                          lm_lock calls 54069538
                        lm_unlock calls 40805646
                           lm callbacks 94932089
                     address operations 2335588
                      dentry operations 4683578
                      export operations 0
                        file operations 3179652
                       inode operations 9595976
                       super operations 39907494
                          vm operations 0
                        block I/O reads 34785108
			     block I/O writes 344510

I would be grateful for any advice, especially regarding locks and tuning.  I
am tempted to set the glock_purge to 50 as described as a fix for the RHEL4u4
locking problem but worry that this might screw things up worse.

The specifics for the system are as follows:
Rhel4u7  smp kernel 2.6.9-78.0.1
gfs 6.1.18-1
gfs-kernel-smp-2-6-9-80.9
rgmanager 1.9.80-1
cman 1.0.24-1
ccs 1.0.12-1
magma 1.0.8-1
magma-plugin 1.0.14-1
lvm2-cluster 2.02.37-3
fence 1.32.63-1

J. Dan 

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster