On Mon, 30 Mar 2009, David Teigland wrote: > On Fri, Mar 27, 2009 at 06:19:50PM +0100, Kadlecsik Jozsef wrote: > > > > Combing through the log files I found the following: > > > > Mar 27 13:31:56 lxserv0 fenced[3833]: web1-gfs not a cluster member after 0 sec post_fail_delay > > Mar 27 13:31:56 lxserv0 fenced[3833]: fencing node "web1-gfs" > > Mar 27 13:31:56 lxserv0 fenced[3833]: can't get node number for node e1??e1?? > > Mar 27 13:31:56 lxserv0 fenced[3833]: fence "web1-gfs" success > > > > The line saying "can't get node number for node e1??e1??" might be > > innocent, but looks suspicious. Why fenced could not get the victim name? > > I've not seen that before, and I can't explain either how cman_get_node() > could have failed or why it printed a garbage string. It's a non-essential > bit of code, so that error should not be related to your problem. Yes, it is surely not related to the freeze, but disturbing. Hm, in the function dispatch_fence_agent there's an ordering issue, I believe. The variable victim_nodename is freed but update_cman is called with variable victim pointing to the just freed victim_nodename. Best regards, Jozsef -- E-mail : kadlec@xxxxxxxxxxxx, kadlec@xxxxxxxxxxxxxxxxx PGP key: http://www.kfki.hu/~kadlec/pgp_public_key.txt Address: KFKI Research Institute for Particle and Nuclear Physics H-1525 Budapest 114, POB. 49, Hungary -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster