On Wed, 2009-11-11 at 11:49 -0500, Lon H. Hohberger wrote: > On Thu, 2009-11-05 at 15:28 +0100, Gianluca Cecchi wrote: > > > Nov 5 12:52:53 mork clurgmgrd[2633]: <notice> Member 2 shutting down > > Nov 5 12:52:57 mork qdiskd[2214]: <info> Node 2 shutdown > > > Nov 5 12:55:41 mork openais[2185]: [TOTEM] The token was lost in the > > OPERATIONAL state. > > That's very interesting. It looks like the what happened to cause the > state change failures was the huge lag time between when rgmanager sent > its "good bye kiss" and the time openais noticed the node was offline. > The timeout was large enough that rgmanager gave up. > > This isn't actually the quorum disk master election problem at all... > It's also very strange. > > - rgmanager should have known this was unnecessary. The other node said > it was going away. > - cman probably should have caused a transition sooner, I think (??) So... rgmanager treats a node which sends the 'EXITING' message as offline. It makes no sense why it would do this and subsequently fail to update the cluster state. case RG_EXITING: if (!member_online(msg_hdr->gh_arg1)) break; logt_print(LOG_NOTICE, "Member %d shutting down\n", msg_hdr->gh_arg1); member_set_state(msg_hdr->gh_arg1, 0); node_event_q(0, msg_hdr->gh_arg1, 0, 1); break; You said in your previous mail that mindy shut down cleanly -- so I'm really stumped... -- Lon -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster