Hi,
I have a two-node cluster running GFS from RHEL4 cvs tag (pulled on June
1st).
I have several GFS LVMs, one of them now uses 16GB storage and 371659
inode (from df -k and df -i).
Other GFS LVMs uses less inodes.
When only one node running, all is OK, I can mount and access all GFS LVMs.
The problem is, when I have both node running and I try to mount that
particular LVM, the node that tries to mount it (lincluster2) has these
messages on syslog :
Jun 1 02:46:41 lincluster2 GFS: Trying to join cluster "lock_dlm",
"lincluster:newapp"
Jun 1 02:46:43 lincluster2 GFS: fsid=lincluster:newapp.1: Joined
cluster. Now mounting FS...
Jun 1 02:46:43 lincluster2 GFS: fsid=lincluster:newapp.1: jid=1: Trying
to acquire journal lock...
Jun 1 02:46:43 lincluster2 GFS: fsid=lincluster:newapp.1: jid=1:
Looking at journal...
Jun 1 02:46:43 lincluster2 GFS: fsid=lincluster:newapp.1: jid=1: Done
Jun 1 02:46:43 lincluster2 GFS: fsid=lincluster:newapp.1: Scanning for
log elements...
and it seems to hang. I assume it uses all CPU power to scan log elements.
That would've been OK, if only the other node could know that it was
still alive. Problem is it doesn't.
Jun 1 02:51:33 lincluster1 kernel: CMAN: removing node lincluster2 from
the cluster : Missed too many heartbeats
Jun 1 02:51:33 lincluster1 fenced[4365]: lincluster2 not a cluster
member after 0 sec post_fail_delay
Jun 1 02:51:33 lincluster1 fenced[4365]: fencing node "lincluster2"
Jun 1 02:51:38 lincluster1 fenced[4365]: fence "lincluster2" success
It seems that lincluster2 is too busy scanning log elements that it
cannot even send CMAN heartbeat. Which makes lincluster1 thinks
lincluster2 is dead, thus fencing it, and rebooting it.
Any ideas how to fix this?
Regards,
Fajar
--
Linux-cluster@xxxxxxxxxx
http://www.redhat.com/mailman/listinfo/linux-cluster