On Fri, 27 Mar 2009, Ben Yarwood wrote: > Replaying a journal as below usually idicates a node has withdrawn from that > file system I believe. You should grep messages on all nodes for 'GFS', if > any node is repoting errors with this fs then it will need rebooting/fencing > before access to that fs can be achieved. The failining node is fenced off. Here are the steps to reproduce the freeze of the node: - all nodes are running and member of the cluster - start the mailman queue manager: the node freezes - the freezed node fenced off by a member of the cluster - I can see log messages as I wrote in my first mail: Mar 26 23:09:24 lxserv1 kernel: dlm: closing connection to node 1 Mar 26 23:09:25 lxserv1 kernel: GFS: fsid=kfki:home.1: jid=3: Trying to acquire journal lock... [...] - sometimes (but not always) the fencing machine freezes as well and then therefore fenced off - third node has never freezed so far and the cluster thus remained in quorum - fenced off machines restarted, join the cluster and work until I start the mailman queue manager The daily backups of the whole GFS file systems are completed, so I assume it's not a filesystem corruption. Best regards, Jozsef -- E-mail : kadlec@xxxxxxxxxxxx, kadlec@xxxxxxxxxxxxxxxxx PGP key: http://www.kfki.hu/~kadlec/pgp_public_key.txt Address: KFKI Research Institute for Particle and Nuclear Physics H-1525 Budapest 114, POB. 49, Hungary -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster