Re: What does FAIL_STOP_WAIT state mean for clvmd and rgmanager

Joel Heenan <joelh@xxxxxxxxxxxxxx> · Mon, 20 Sep 2010 16:21:29 +1000

I'm not sure possibly it was from doing a "service cman restart" 

I understand its always preferrable to reboot with cluster suite but some of our physical hosts can take 20 minutes to do a full reboot, so I'm always look for some way to fix them online.

Joel

On Fri, Sep 10, 2010 at 4:03 AM, Lon Hohberger <lhh@xxxxxxxxxx> wrote:

On Mon, 2010-08-23 at 17:58 +1000, Joel Heenan wrote:

> Can someone please explain what this means and what you can do to get

> out of it:

>

> [root@cluster-host ~]# group_tool -v

> type             level name       id       state node id local_done

> fence            0     default    00010003 JOIN_STOP_WAIT 1 100050001

> 1

> [1 1 2 3 4]

> dlm              1     clvmd      00020003 FAIL_STOP_WAIT 2 200030003

> 1

> [1 2 3 4]

> dlm              1     rgmanager  00030003 FAIL_STOP_WAIT 2 200030003

> 1

> [1 2 3 4]

It looks like fencing has not completed.  How do you have 2 node 1's in

the fencing group?

-- Lon

--

Linux-cluster mailing list

Linux-cluster@xxxxxxxxxx

https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster