cluster failure ...

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



In past couple of weeks..

Cluster fence node for missed too many heartbeats. Node goes away. No other node in a cluster tries to acquire his part of lock. Fenced node do come up and again joins a cluster in meanwhile there is a lock on a shared fs and it ends in a high load nobody can log in.

Sep 16 15:06:37 clu-V kernel: CMAN: node clu-III has been removed from the cluster : Missed too many heartbeats
Sep 16 15:09:07 clu-V kernel: CMAN: node clu-III rejoining
After a cluster restart everything is fine.

Again when I manually issue fence_node <nodename> i do get this messages of other nodes trying to acquire part of dlm.

tail /var/log/messages
Sep 18 01:22:02 clu-X kernel: GFS: fsid=mail:homes.1: jid=4: Looking at journal... Sep 18 01:22:02 clu-X kernel: GFS: fsid=mail:shared.1: jid=4: Trying to acquire journal lock...
Sep 18 01:22:02 clu-X kernel: GFS: fsid=mail:mailbox.1: jid=4: Busy
Sep 18 01:22:02 clu-X kernel: GFS: fsid=mail:shared.1: jid=4: Busy
Sep 18 01:22:02 clu-X kernel: GFS: fsid=mail:homes.1: jid=4: Acquiring the transaction lock... Sep 18 01:22:02 clu-X kernel: GFS: fsid=mail:homes.1: jid=4: Replaying journal... Sep 18 01:22:02 clu-X kernel: GFS: fsid=mail:homes.1: jid=4: Replayed 1 of 2 blocks Sep 18 01:22:02 clu-X kernel: GFS: fsid=mail:homes.1: jid=4: replays = 1, skips = 1, sames = 0 Sep 18 01:22:02 clu-X kernel: GFS: fsid=mail:homes.1: jid=4: Journal replayed in 1s
Sep 18 01:22:02 clu-X kernel: GFS: fsid=mail:homes.1: jid=4: Done
Did anyone have this kind of a problem?

I have to mention this happened over weekend or night when there is no significant load on a cluster.
the GFS version is cvs-20060714

--
Ivan Pantovic, System Engineer
-----
YUnet International  http://www.eunet.yu
Dubrovacka 35/III,   11000 Belgrade
Tel: +381 11 311 9901;  Fax: +381 11 311 9901; Mob: +381 63 302 288
-----
This  e-mail  is confidential and intended only for the recipient.
Unauthorized  distribution,  modification  or  disclosure  of  its
contents is prohibited. If you have received this e-mail in error,
please notify the sender by telephone  +381 11 311 9901.
-----

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

[Index of Archives]     [Corosync Cluster Engine]     [GFS]     [Linux Virtualization]     [Centos Virtualization]     [Centos]     [Linux RAID]     [Fedora Users]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite Camping]

  Powered by Linux