In past couple of weeks..
Cluster fence node for missed too many heartbeats. Node goes away. No
other node in a cluster tries to acquire his part of lock.
Fenced node do come up and again joins a cluster in meanwhile there is a
lock on a shared fs and it ends in a high load nobody can log in.
Sep 16 15:06:37 clu-V kernel: CMAN: node clu-III has been removed from
the cluster : Missed too many heartbeats
Sep 16 15:09:07 clu-V kernel: CMAN: node clu-III rejoining
After a cluster restart everything is fine.
Again when I manually issue fence_node <nodename> i do get this messages
of other nodes trying to acquire part of dlm.
tail /var/log/messages
Sep 18 01:22:02 clu-X kernel: GFS: fsid=mail:homes.1: jid=4: Looking
at journal...
Sep 18 01:22:02 clu-X kernel: GFS: fsid=mail:shared.1: jid=4: Trying
to acquire journal lock...
Sep 18 01:22:02 clu-X kernel: GFS: fsid=mail:mailbox.1: jid=4: Busy
Sep 18 01:22:02 clu-X kernel: GFS: fsid=mail:shared.1: jid=4: Busy
Sep 18 01:22:02 clu-X kernel: GFS: fsid=mail:homes.1: jid=4: Acquiring
the transaction lock...
Sep 18 01:22:02 clu-X kernel: GFS: fsid=mail:homes.1: jid=4: Replaying
journal...
Sep 18 01:22:02 clu-X kernel: GFS: fsid=mail:homes.1: jid=4: Replayed
1 of 2 blocks
Sep 18 01:22:02 clu-X kernel: GFS: fsid=mail:homes.1: jid=4: replays =
1, skips = 1, sames = 0
Sep 18 01:22:02 clu-X kernel: GFS: fsid=mail:homes.1: jid=4: Journal
replayed in 1s
Sep 18 01:22:02 clu-X kernel: GFS: fsid=mail:homes.1: jid=4: Done
Did anyone have this kind of a problem?
I have to mention this happened over weekend or night when there is no
significant load on a cluster.
the GFS version is cvs-20060714
--
Ivan Pantovic, System Engineer
-----
YUnet International http://www.eunet.yu
Dubrovacka 35/III, 11000 Belgrade
Tel: +381 11 311 9901; Fax: +381 11 311 9901; Mob: +381 63 302 288
-----
This e-mail is confidential and intended only for the recipient.
Unauthorized distribution, modification or disclosure of its
contents is prohibited. If you have received this e-mail in error,
please notify the sender by telephone +381 11 311 9901.
-----
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster