On 03/10/14 10:35 AM, Daniel Dehennin wrote:
Hello,
I'm trying to setup pacemaker+corosync on Debian Wheezy to access a SAN
for an OpenNebula cluster.
As I'm new to cluster world, I have hard time figuring why sometime
things get really wrong and where I must look to find answers.
My OpenNebula frontend, running in a VM, does not manage to run the
resources and my syslog has a lot of:
#+begin_src
ocfs2_controld: Unable to open checkpoint "ocfs2:controld": Object does not exist
#+end_src
When this happens, other nodes have problem:
#+begin_src
root@nebula3:~# LANG=C vgscan
cluster request failed: Host is down
Unable to obtain global lock.
#+end_src
But things looks fin in “crm_mon”:
#+begin_src
root@nebula3:~# crm_mon -1
============
Last updated: Fri Oct 3 16:25:43 2014
Last change: Fri Oct 3 14:51:59 2014 via cibadmin on nebula1
Stack: openais
Current DC: nebula3 - partition with quorum
Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff
5 Nodes configured, 5 expected votes
32 Resources configured.
============
Node quorum: standby
Online: [ nebula3 nebula2 nebula1 ]
OFFLINE: [ one ]
Stonith-nebula3-IPMILAN (stonith:external/ipmi): Started nebula2
Stonith-nebula2-IPMILAN (stonith:external/ipmi): Started nebula3
Stonith-nebula1-IPMILAN (stonith:external/ipmi): Started nebula2
Clone Set: ONE-Storage-Clone [ONE-Storage]
Started: [ nebula1 nebula3 nebula2 ]
Stopped: [ ONE-Storage:3 ONE-Storage:4 ]
Quorum-Node (ocf::heartbeat:VirtualDomain): Started nebula3
Stonith-Quorum-Node (stonith:external/libvirt): Started nebula3
#+end_src
I don't know how to interpret dlm_tool informations:
#+begin_src
root@nebula3:~# dlm_tool ls -n
dlm lockspaces
name CCB10CE8D4FF489B9A2ECB288DACF2D7
id 0x09250e49
flags 0x00000008 fs_reg
change member 3 joined 1 remove 0 failed 0 seq 2,2
members 1189587136 1206364352 1223141568
all nodes
nodeid 1189587136 member 1 failed 0 start 1 seq_add 1 seq_rem 0 check none
nodeid 1206364352 member 1 failed 0 start 1 seq_add 2 seq_rem 0 check none
nodeid 1223141568 member 1 failed 0 start 1 seq_add 1 seq_rem 0 check none
name clvmd
id 0x4104eefa
flags 0x00000000
change member 3 joined 0 remove 1 failed 0 seq 4,4
members 1189587136 1206364352 1223141568
all nodes
nodeid 1172809920 member 0 failed 0 start 0 seq_add 3 seq_rem 4 check none
nodeid 1189587136 member 1 failed 0 start 1 seq_add 1 seq_rem 0 check none
nodeid 1206364352 member 1 failed 0 start 1 seq_add 2 seq_rem 0 check none
nodeid 1223141568 member 1 failed 0 start 1 seq_add 1 seq_rem 0 check none
#+end_src
Is there any documentation on troubleshooting DLM/cLVM?
Regards.
Can you paste your full pacemaker config and the logs from the other
nodes starting just before the lost node went away?
--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster