Re: "dlm_controld[nnnn]: cluster is down, exiting" on node1 when starting node2

Charlie Brady <charlieb-linux-cluster@xxxxxxxxxxxxxxxxxx> · Fri, 5 Jun 2009 12:50:57 -0400 (EDT)

On Fri, 5 Jun 2009, David Teigland wrote:

On Fri, Jun 05, 2009 at 11:42:59AM -0400, Charlie Brady wrote:

On Fri, 5 Jun 2009, David Teigland wrote:

On Thu, Jun 04, 2009 at 04:23:13PM -0400, Charlie Brady wrote:
Jun  4 10:55:34 sun4150node1 dlm_controld[7916]: cluster is down, exiting
Jun  4 10:55:34 sun4150node1 fenced[7910]: cluster is down, exiting
Jun  4 10:55:34 sun4150node1 gfs_controld[7922]: cluster is down, exiting
Jun  4 10:55:35 sun4150node1 qdiskd[8128]: <err> cman_dispatch: Host is
down

They are all complaining that the the cluster is down, which is a polite
way
of saying that aisexec has died/crashed/failed/killed/gone-away.

Thanks. Why might that have occurred? Where would I look for clues? How
can I increase logging output from aisexec?

If you're lucky it'll leave a core file, otherwise aisexec is notorious for
disappearing without leaving any clues about why.

That's very disconcerting to hear. Doesn't sound like HA. :-(

I don't have any core files.

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster