On Fri, Sep 28, 2007 at 11:12:40AM +0200, Borgstr?m Jonas wrote: > Anyone with an idea why a "sleep 30" is needed for fenced to be able to > join the fence group properly? > > Even though this workaround appears to work it would be nice to have a > more solid solution. Since now I will need to remember to patch the init > script every time it's updated. We never got to the bottom of what the problem is AFAIK. > > > 1190645954 client 3: dump <--- Before killing prod-db1 > > > 1190645985 stop default > > > 1190645985 start default 3 members 2 > > > 1190645985 do_recovery stop 2 start 3 finish 1 > > > 1190645985 finish default 3 > > > 1190646008 client 3: dump <--- After killing prod-db1 > > > > Node 1 isn't fenced here because it never completed joining the fence > > group above. This is the problem we need to debug. Here's what I suggested before to do that: "A 'group_tool -v' here should show the state of the fence group still in transition. Could you run that, plus a 'group_tool dump' at this point, in addition to the 'dump fence' you have. And please run those commands on both nodes." Dave -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster