Hello, I'm trying to setup pacemaker+corosync on Debian Wheezy to access a SAN for an OpenNebula cluster. As I'm new to cluster world, I have hard time figuring why sometime things get really wrong and where I must look to find answers. My OpenNebula frontend, running in a VM, does not manage to run the resources and my syslog has a lot of: #+begin_src ocfs2_controld: Unable to open checkpoint "ocfs2:controld": Object does not exist #+end_src When this happens, other nodes have problem: #+begin_src root@nebula3:~# LANG=C vgscan cluster request failed: Host is down Unable to obtain global lock. #+end_src But things looks fin in “crm_mon”: #+begin_src root@nebula3:~# crm_mon -1 ============ Last updated: Fri Oct 3 16:25:43 2014 Last change: Fri Oct 3 14:51:59 2014 via cibadmin on nebula1 Stack: openais Current DC: nebula3 - partition with quorum Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff 5 Nodes configured, 5 expected votes 32 Resources configured. ============ Node quorum: standby Online: [ nebula3 nebula2 nebula1 ] OFFLINE: [ one ] Stonith-nebula3-IPMILAN (stonith:external/ipmi): Started nebula2 Stonith-nebula2-IPMILAN (stonith:external/ipmi): Started nebula3 Stonith-nebula1-IPMILAN (stonith:external/ipmi): Started nebula2 Clone Set: ONE-Storage-Clone [ONE-Storage] Started: [ nebula1 nebula3 nebula2 ] Stopped: [ ONE-Storage:3 ONE-Storage:4 ] Quorum-Node (ocf::heartbeat:VirtualDomain): Started nebula3 Stonith-Quorum-Node (stonith:external/libvirt): Started nebula3 #+end_src I don't know how to interpret dlm_tool informations: #+begin_src root@nebula3:~# dlm_tool ls -n dlm lockspaces name CCB10CE8D4FF489B9A2ECB288DACF2D7 id 0x09250e49 flags 0x00000008 fs_reg change member 3 joined 1 remove 0 failed 0 seq 2,2 members 1189587136 1206364352 1223141568 all nodes nodeid 1189587136 member 1 failed 0 start 1 seq_add 1 seq_rem 0 check none nodeid 1206364352 member 1 failed 0 start 1 seq_add 2 seq_rem 0 check none nodeid 1223141568 member 1 failed 0 start 1 seq_add 1 seq_rem 0 check none name clvmd id 0x4104eefa flags 0x00000000 change member 3 joined 0 remove 1 failed 0 seq 4,4 members 1189587136 1206364352 1223141568 all nodes nodeid 1172809920 member 0 failed 0 start 0 seq_add 3 seq_rem 4 check none nodeid 1189587136 member 1 failed 0 start 1 seq_add 1 seq_rem 0 check none nodeid 1206364352 member 1 failed 0 start 1 seq_add 2 seq_rem 0 check none nodeid 1223141568 member 1 failed 0 start 1 seq_add 1 seq_rem 0 check none #+end_src
1412340044 dlm_controld 3.0.12 started 1412340044 found /dev/misc/dlm-control minor 58 1412340044 found /dev/misc/dlm-monitor minor 57 1412340044 found /dev/misc/dlm_plock minor 56 1412340044 /dev/misc/dlm-monitor fd 11 1412340044 /sys/kernel/config/dlm/cluster/comms: opendir failed: 2 1412340044 /sys/kernel/config/dlm/cluster/spaces: opendir failed: 2 1412340044 totem/rrp_mode = 'none' 1412340044 set protocol 0 1412340044 group_mode 3 compat 0 1412340044 setup_cpg_daemon 13 1412340044 dlm:controld conf 2 1 0 memb 1189587136 1223141568 join 1223141568 left 1412340044 run protocol from nodeid 1189587136 1412340044 daemon run 1.1.1 max 1.1.1 kernel run 1.1.1 max 1.1.1 1412340044 plocks 15 1412340044 plock cpg message size: 104 bytes 1412340044 Processing membership 22676 1412340044 Adding address ip(192.168.231.70) to configfs for node 1189587136 1412340044 set_configfs_node 1189587136 192.168.231.70 local 0 1412340044 Added active node 1189587136: born-on=22628, last-seen=22676, this-event=22676, last-event=0 1412340044 Adding address ip(192.168.231.71) to configfs for node 1206364352 1412340044 set_configfs_node 1206364352 192.168.231.71 local 0 1412340044 Added active node 1206364352: born-on=22632, last-seen=22676, this-event=22676, last-event=0 1412340044 Adding address ip(192.168.231.72) to configfs for node 1223141568 1412340044 set_configfs_node 1223141568 192.168.231.72 local 1 1412340044 Added active node 1223141568: born-on=22636, last-seen=22676, this-event=22676, last-event=0 1412340044 dlm:controld conf 3 1 0 memb 1189587136 1206364352 1223141568 join 1206364352 left 1412340045 client connection 5 fd 16 1412340047 uevent: add@/kernel/dlm/clvmd 1412340047 kernel: add@ clvmd 1412340047 uevent: online@/kernel/dlm/clvmd 1412340047 kernel: online@ clvmd 1412340047 dlm:ls:clvmd conf 2 1 0 memb 1189587136 1223141568 join 1223141568 left 1412340047 clvmd add_change cg 1 joined nodeid 1223141568 1412340047 clvmd add_change cg 1 we joined 1412340047 clvmd add_change cg 1 counts member 2 joined 1 remove 0 failed 0 1412340047 clvmd check_fencing done 1412340047 clvmd check_quorum disabled 1412340047 clvmd check_fs none registered 1412340047 clvmd send_start cg 1 flags 1 data2 0 counts 0 2 1 0 0 1412340047 clvmd receive_start 1189587136:2 len 80 1412340047 clvmd match_change 1189587136:2 matches cg 1 1412340047 clvmd wait_messages cg 1 need 1 of 2 1412340047 clvmd receive_start 1223141568:1 len 80 1412340047 clvmd match_change 1223141568:1 matches cg 1 1412340047 clvmd wait_messages cg 1 got all 2 1412340047 clvmd start_kernel cg 1 member_count 2 1412340047 write "1090842362" to "/sys/kernel/dlm/clvmd/id" 1412340047 set_members mkdir "/sys/kernel/config/dlm/cluster/spaces/clvmd/nodes/1189587136" 1412340047 set_members mkdir "/sys/kernel/config/dlm/cluster/spaces/clvmd/nodes/1223141568" 1412340047 write "1" to "/sys/kernel/dlm/clvmd/control" 1412340047 write "0" to "/sys/kernel/dlm/clvmd/event_done" 1412340047 clvmd set_plock_ckpt_node from 0 to 1189587136 1412340047 clvmd receive_plocks_stored 1189587136:2 flags a sig 0 need_plocks 1 1412340047 clvmd match_change 1189587136:2 matches cg 1 1412340047 clvmd retrieve_plocks 1412340047 clvmd retrieve_plocks first 0 last 0 r_count 0 p_count 0 sig 0 1412340047 uevent: add@/devices/virtual/misc/dlm_clvmd 1412340047 dlm:ls:clvmd conf 3 1 0 memb 1189587136 1206364352 1223141568 join 1206364352 left 1412340047 clvmd add_change cg 2 joined nodeid 1206364352 1412340047 clvmd add_change cg 2 counts member 3 joined 1 remove 0 failed 0 1412340047 clvmd stop_kernel cg 2 1412340047 write "0" to "/sys/kernel/dlm/clvmd/control" 1412340047 clvmd check_fencing done 1412340047 clvmd check_quorum disabled 1412340047 clvmd check_fs none registered 1412340047 clvmd send_start cg 2 flags 2 data2 0 counts 1 3 1 0 0 1412340047 clvmd receive_start 1206364352:1 len 84 1412340047 clvmd match_change 1206364352:1 matches cg 2 1412340047 clvmd wait_messages cg 2 need 2 of 3 1412340047 clvmd receive_start 1189587136:3 len 84 1412340047 clvmd match_change 1189587136:3 matches cg 2 1412340047 clvmd wait_messages cg 2 need 1 of 3 1412340047 clvmd receive_start 1223141568:2 len 84 1412340047 clvmd match_change 1223141568:2 matches cg 2 1412340047 clvmd wait_messages cg 2 got all 3 1412340047 clvmd start_kernel cg 2 member_count 3 1412340047 dir_member 1223141568 1412340047 dir_member 1189587136 1412340047 set_members mkdir "/sys/kernel/config/dlm/cluster/spaces/clvmd/nodes/1206364352" 1412340047 write "1" to "/sys/kernel/dlm/clvmd/control" 1412340047 clvmd set_plock_ckpt_node from 1189587136 to 1189587136 1412340047 clvmd receive_plocks_stored 1189587136:3 flags a sig 0 need_plocks 0 1412340049 uevent: add@/kernel/dlm/CCB10CE8D4FF489B9A2ECB288DACF2D7 1412340049 kernel: add@ CCB10CE8D4FF489B9A2ECB288DACF2D7 1412340049 uevent: online@/kernel/dlm/CCB10CE8D4FF489B9A2ECB288DACF2D7 1412340049 kernel: online@ CCB10CE8D4FF489B9A2ECB288DACF2D7 1412340049 dlm:ls:CCB10CE8D4FF489B9A2ECB288DACF2D7 conf 2 1 0 memb 1189587136 1223141568 join 1223141568 left 1412340049 CCB10CE8D4FF489B9A2ECB288DACF2D7 add_change cg 1 joined nodeid 1223141568 1412340049 CCB10CE8D4FF489B9A2ECB288DACF2D7 add_change cg 1 we joined 1412340049 CCB10CE8D4FF489B9A2ECB288DACF2D7 add_change cg 1 counts member 2 joined 1 remove 0 failed 0 1412340049 CCB10CE8D4FF489B9A2ECB288DACF2D7 check_fencing done 1412340049 CCB10CE8D4FF489B9A2ECB288DACF2D7 check_quorum disabled 1412340049 CCB10CE8D4FF489B9A2ECB288DACF2D7 check_fs done 1412340049 CCB10CE8D4FF489B9A2ECB288DACF2D7 send_start cg 1 flags 1 data2 0 counts 0 2 1 0 0 1412340049 CCB10CE8D4FF489B9A2ECB288DACF2D7 receive_start 1223141568:1 len 80 1412340049 CCB10CE8D4FF489B9A2ECB288DACF2D7 match_change 1223141568:1 matches cg 1 1412340049 CCB10CE8D4FF489B9A2ECB288DACF2D7 wait_messages cg 1 need 1 of 2 1412340049 CCB10CE8D4FF489B9A2ECB288DACF2D7 receive_start 1189587136:2 len 80 1412340049 CCB10CE8D4FF489B9A2ECB288DACF2D7 match_change 1189587136:2 matches cg 1 1412340049 CCB10CE8D4FF489B9A2ECB288DACF2D7 wait_messages cg 1 got all 2 1412340049 CCB10CE8D4FF489B9A2ECB288DACF2D7 start_kernel cg 1 member_count 2 1412340049 write "153423433" to "/sys/kernel/dlm/CCB10CE8D4FF489B9A2ECB288DACF2D7/id" 1412340049 set_members mkdir "/sys/kernel/config/dlm/cluster/spaces/CCB10CE8D4FF489B9A2ECB288DACF2D7/nodes/1189587136" 1412340049 set_members mkdir "/sys/kernel/config/dlm/cluster/spaces/CCB10CE8D4FF489B9A2ECB288DACF2D7/nodes/1223141568" 1412340049 write "1" to "/sys/kernel/dlm/CCB10CE8D4FF489B9A2ECB288DACF2D7/control" 1412340049 write "0" to "/sys/kernel/dlm/CCB10CE8D4FF489B9A2ECB288DACF2D7/event_done" 1412340049 CCB10CE8D4FF489B9A2ECB288DACF2D7 set_plock_ckpt_node from 0 to 1189587136 1412340049 CCB10CE8D4FF489B9A2ECB288DACF2D7 receive_plocks_stored 1189587136:2 flags a sig 0 need_plocks 1 1412340049 CCB10CE8D4FF489B9A2ECB288DACF2D7 match_change 1189587136:2 matches cg 1 1412340049 CCB10CE8D4FF489B9A2ECB288DACF2D7 retrieve_plocks 1412340049 CCB10CE8D4FF489B9A2ECB288DACF2D7 retrieve_plocks first 0 last 0 r_count 0 p_count 0 sig 0 1412340049 dlm:ls:CCB10CE8D4FF489B9A2ECB288DACF2D7 conf 3 1 0 memb 1189587136 1206364352 1223141568 join 1206364352 left 1412340049 CCB10CE8D4FF489B9A2ECB288DACF2D7 add_change cg 2 joined nodeid 1206364352 1412340049 CCB10CE8D4FF489B9A2ECB288DACF2D7 add_change cg 2 counts member 3 joined 1 remove 0 failed 0 1412340049 CCB10CE8D4FF489B9A2ECB288DACF2D7 stop_kernel cg 2 1412340049 write "0" to "/sys/kernel/dlm/CCB10CE8D4FF489B9A2ECB288DACF2D7/control" 1412340049 CCB10CE8D4FF489B9A2ECB288DACF2D7 check_fencing done 1412340049 CCB10CE8D4FF489B9A2ECB288DACF2D7 check_quorum disabled 1412340049 CCB10CE8D4FF489B9A2ECB288DACF2D7 check_fs done 1412340049 CCB10CE8D4FF489B9A2ECB288DACF2D7 send_start cg 2 flags 2 data2 0 counts 1 3 1 0 0 1412340049 CCB10CE8D4FF489B9A2ECB288DACF2D7 receive_start 1206364352:1 len 84 1412340049 CCB10CE8D4FF489B9A2ECB288DACF2D7 match_change 1206364352:1 matches cg 2 1412340049 CCB10CE8D4FF489B9A2ECB288DACF2D7 wait_messages cg 2 need 2 of 3 1412340049 CCB10CE8D4FF489B9A2ECB288DACF2D7 receive_start 1189587136:3 len 84 1412340049 CCB10CE8D4FF489B9A2ECB288DACF2D7 match_change 1189587136:3 matches cg 2 1412340049 CCB10CE8D4FF489B9A2ECB288DACF2D7 wait_messages cg 2 need 1 of 3 1412340049 CCB10CE8D4FF489B9A2ECB288DACF2D7 receive_start 1223141568:2 len 84 1412340049 CCB10CE8D4FF489B9A2ECB288DACF2D7 match_change 1223141568:2 matches cg 2 1412340049 CCB10CE8D4FF489B9A2ECB288DACF2D7 wait_messages cg 2 got all 3 1412340049 CCB10CE8D4FF489B9A2ECB288DACF2D7 start_kernel cg 2 member_count 3 1412340049 dir_member 1223141568 1412340049 dir_member 1189587136 1412340049 set_members mkdir "/sys/kernel/config/dlm/cluster/spaces/CCB10CE8D4FF489B9A2ECB288DACF2D7/nodes/1206364352" 1412340049 write "1" to "/sys/kernel/dlm/CCB10CE8D4FF489B9A2ECB288DACF2D7/control" 1412340049 CCB10CE8D4FF489B9A2ECB288DACF2D7 set_plock_ckpt_node from 1189587136 to 1189587136 1412340049 CCB10CE8D4FF489B9A2ECB288DACF2D7 receive_plocks_stored 1189587136:3 flags a sig 0 need_plocks 0 1412340173 Processing membership 22680 1412340173 Adding address ip(192.168.231.68) to configfs for node 1156032704 1412340173 set_configfs_node 1156032704 192.168.231.68 local 0 1412340173 Added active node 1156032704: born-on=0, last-seen=22680, this-event=22680, last-event=22676 1412340173 Skipped active node 1189587136: born-on=22628, last-seen=22680, this-event=22680, last-event=22676 1412340173 Skipped active node 1206364352: born-on=22632, last-seen=22680, this-event=22680, last-event=22676 1412340173 Skipped active node 1223141568: born-on=22636, last-seen=22680, this-event=22680, last-event=22676 1412340294 Processing membership 22684 1412340294 Skipped active node 1156032704: born-on=22680, last-seen=22684, this-event=22684, last-event=22680 1412340294 Adding address ip(192.168.231.69) to configfs for node 1172809920 1412340294 set_configfs_node 1172809920 192.168.231.69 local 0 1412340294 Added active node 1172809920: born-on=0, last-seen=22684, this-event=22684, last-event=22680 1412340294 Skipped active node 1189587136: born-on=22628, last-seen=22684, this-event=22684, last-event=22680 1412340294 Skipped active node 1206364352: born-on=22632, last-seen=22684, this-event=22684, last-event=22680 1412340294 Skipped active node 1223141568: born-on=22636, last-seen=22684, this-event=22684, last-event=22680 1412340439 dlm:controld conf 4 1 0 memb 1172809920 1189587136 1206364352 1223141568 join 1172809920 left 1412340443 dlm:ls:clvmd conf 4 1 0 memb 1172809920 1189587136 1206364352 1223141568 join 1172809920 left 1412340443 clvmd add_change cg 3 joined nodeid 1172809920 1412340443 clvmd add_change cg 3 counts member 4 joined 1 remove 0 failed 0 1412340443 clvmd stop_kernel cg 3 1412340443 write "0" to "/sys/kernel/dlm/clvmd/control" 1412340443 clvmd check_fencing done 1412340443 clvmd check_quorum disabled 1412340443 clvmd check_fs none registered 1412340443 clvmd send_start cg 3 flags 2 data2 0 counts 2 4 1 0 0 1412340443 clvmd receive_start 1206364352:2 len 88 1412340443 clvmd match_change 1206364352:2 matches cg 3 1412340443 clvmd wait_messages cg 3 need 3 of 4 1412340443 clvmd receive_start 1223141568:3 len 88 1412340443 clvmd match_change 1223141568:3 matches cg 3 1412340443 clvmd wait_messages cg 3 need 2 of 4 1412340443 clvmd receive_start 1172809920:1 len 88 1412340443 clvmd match_change 1172809920:1 matches cg 3 1412340443 clvmd wait_messages cg 3 need 1 of 4 1412340443 clvmd receive_start 1189587136:4 len 88 1412340443 clvmd match_change 1189587136:4 matches cg 3 1412340443 clvmd wait_messages cg 3 got all 4 1412340443 clvmd start_kernel cg 3 member_count 4 1412340443 dir_member 1206364352 1412340443 dir_member 1223141568 1412340443 dir_member 1189587136 1412340443 set_members mkdir "/sys/kernel/config/dlm/cluster/spaces/clvmd/nodes/1172809920" 1412340443 write "1" to "/sys/kernel/dlm/clvmd/control" 1412340443 clvmd set_plock_ckpt_node from 1189587136 to 1189587136 1412340443 clvmd receive_plocks_stored 1189587136:4 flags a sig 0 need_plocks 0 1412340447 dlm:ls:clvmd conf 3 0 1 memb 1189587136 1206364352 1223141568 join left 1172809920 1412340447 clvmd add_change cg 4 remove nodeid 1172809920 reason 2 1412340447 clvmd add_change cg 4 counts member 3 joined 0 remove 1 failed 0 1412340447 clvmd stop_kernel cg 4 1412340447 write "0" to "/sys/kernel/dlm/clvmd/control" 1412340447 clvmd check_fencing done 1412340447 clvmd check_quorum disabled 1412340447 clvmd check_fs none registered 1412340447 clvmd send_start cg 4 flags 2 data2 0 counts 3 3 0 1 0 1412340447 clvmd receive_start 1223141568:4 len 84 1412340447 clvmd match_change 1223141568:4 matches cg 4 1412340447 clvmd wait_messages cg 4 need 2 of 3 1412340447 clvmd receive_start 1189587136:5 len 84 1412340447 clvmd match_change 1189587136:5 matches cg 4 1412340447 clvmd wait_messages cg 4 need 1 of 3 1412340447 clvmd receive_start 1206364352:3 len 84 1412340447 clvmd match_change 1206364352:3 matches cg 4 1412340447 clvmd wait_messages cg 4 got all 3 1412340447 clvmd start_kernel cg 4 member_count 3 1412340447 dir_member 1172809920 1412340447 dir_member 1206364352 1412340447 dir_member 1223141568 1412340447 dir_member 1189587136 1412340447 set_members rmdir "/sys/kernel/config/dlm/cluster/spaces/clvmd/nodes/1172809920" 1412340447 write "1" to "/sys/kernel/dlm/clvmd/control" 1412340447 clvmd set_plock_ckpt_node from 1189587136 to 1189587136 1412340447 clvmd receive_plocks_stored 1189587136:5 flags a sig 0 need_plocks 0 1412340448 dlm:controld conf 3 0 1 memb 1189587136 1206364352 1223141568 join left 1172809920 1412340448 dlm:controld conf 3 0 1 memb 1189587136 1206364352 1223141568 join left 1172809920 1412340507 Processing membership 22688 1412340507 Skipped active node 1156032704: born-on=22680, last-seen=22688, this-event=22688, last-event=22684 1412340507 del_configfs_node rmdir "/sys/kernel/config/dlm/cluster/comms/1172809920" 1412340507 Removed inactive node 1172809920: born-on=22684, last-seen=22684, this-event=22688, last-event=22684 1412340507 Skipped active node 1189587136: born-on=22628, last-seen=22688, this-event=22688, last-event=22684 1412340507 Skipped active node 1206364352: born-on=22632, last-seen=22688, this-event=22688, last-event=22684 1412340507 Skipped active node 1223141568: born-on=22636, last-seen=22688, this-event=22688, last-event=22684 1412340532 Processing membership 22692 1412340532 Skipped active node 1156032704: born-on=22680, last-seen=22692, this-event=22692, last-event=22688 1412340532 Adding address ip(192.168.231.69) to configfs for node 1172809920 1412340532 set_configfs_node 1172809920 192.168.231.69 local 0 1412340532 Added active node 1172809920: born-on=22684, last-seen=22692, this-event=22692, last-event=22688 1412340532 Skipped active node 1189587136: born-on=22628, last-seen=22692, this-event=22692, last-event=22688 1412340532 Skipped active node 1206364352: born-on=22632, last-seen=22692, this-event=22692, last-event=22688 1412340532 Skipped active node 1223141568: born-on=22636, last-seen=22692, this-event=22692, last-event=22688 1412340570 Processing membership 22696 1412340570 Skipped active node 1156032704: born-on=22680, last-seen=22696, this-event=22696, last-event=22692 1412340570 del_configfs_node rmdir "/sys/kernel/config/dlm/cluster/comms/1172809920" 1412340570 Removed inactive node 1172809920: born-on=22692, last-seen=22692, this-event=22696, last-event=22692 1412340570 Skipped active node 1189587136: born-on=22628, last-seen=22696, this-event=22696, last-event=22692 1412340570 Skipped active node 1206364352: born-on=22632, last-seen=22696, this-event=22696, last-event=22692 1412340570 Skipped active node 1223141568: born-on=22636, last-seen=22696, this-event=22696, last-event=22692
Is there any documentation on troubleshooting DLM/cLVM? Regards. -- Daniel Dehennin Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF Fingerprint: 3E69 014E 5C23 50E8 9ED6 2AAD CC1E 9E5B 7A6F E2DF
Attachment:
signature.asc
Description: PGP signature
-- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster