cLVM unusable on quorated cluster

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

I'm trying to setup pacemaker+corosync on Debian Wheezy to access a SAN
for an OpenNebula cluster.

As I'm new to cluster world, I have hard time figuring why sometime
things get really wrong and where I must look to find answers.

My OpenNebula frontend, running in a VM, does not manage to run the
resources and my syslog has a lot of:

#+begin_src
ocfs2_controld: Unable to open checkpoint "ocfs2:controld": Object does not exist
#+end_src

When this happens, other nodes have problem:

#+begin_src
root@nebula3:~# LANG=C vgscan
  cluster request failed: Host is down
  Unable to obtain global lock.
#+end_src

But things looks fin in “crm_mon”:

#+begin_src
root@nebula3:~# crm_mon -1
============
Last updated: Fri Oct  3 16:25:43 2014
Last change: Fri Oct  3 14:51:59 2014 via cibadmin on nebula1
Stack: openais
Current DC: nebula3 - partition with quorum
Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff
5 Nodes configured, 5 expected votes
32 Resources configured.
============

Node quorum: standby
Online: [ nebula3 nebula2 nebula1 ]
OFFLINE: [ one ]

 Stonith-nebula3-IPMILAN    (stonith:external/ipmi):    Started nebula2
 Stonith-nebula2-IPMILAN    (stonith:external/ipmi):    Started nebula3
 Stonith-nebula1-IPMILAN    (stonith:external/ipmi):    Started nebula2
 Clone Set: ONE-Storage-Clone [ONE-Storage]
     Started: [ nebula1 nebula3 nebula2 ]
     Stopped: [ ONE-Storage:3 ONE-Storage:4 ]
 Quorum-Node    (ocf::heartbeat:VirtualDomain): Started nebula3
 Stonith-Quorum-Node   (stonith:external/libvirt):   Started nebula3
#+end_src

I don't know how to interpret dlm_tool informations:

#+begin_src
root@nebula3:~# dlm_tool ls -n
dlm lockspaces
name          CCB10CE8D4FF489B9A2ECB288DACF2D7
id            0x09250e49
flags         0x00000008 fs_reg
change        member 3 joined 1 remove 0 failed 0 seq 2,2
members       1189587136 1206364352 1223141568 
all nodes
nodeid 1189587136 member 1 failed 0 start 1 seq_add 1 seq_rem 0 check none
nodeid 1206364352 member 1 failed 0 start 1 seq_add 2 seq_rem 0 check none
nodeid 1223141568 member 1 failed 0 start 1 seq_add 1 seq_rem 0 check none

name          clvmd
id            0x4104eefa
flags         0x00000000 
change        member 3 joined 0 remove 1 failed 0 seq 4,4
members       1189587136 1206364352 1223141568 
all nodes
nodeid 1172809920 member 0 failed 0 start 0 seq_add 3 seq_rem 4 check none
nodeid 1189587136 member 1 failed 0 start 1 seq_add 1 seq_rem 0 check none
nodeid 1206364352 member 1 failed 0 start 1 seq_add 2 seq_rem 0 check none
nodeid 1223141568 member 1 failed 0 start 1 seq_add 1 seq_rem 0 check none
#+end_src

1412340044 dlm_controld 3.0.12 started
1412340044 found /dev/misc/dlm-control minor 58
1412340044 found /dev/misc/dlm-monitor minor 57
1412340044 found /dev/misc/dlm_plock minor 56
1412340044 /dev/misc/dlm-monitor fd 11
1412340044 /sys/kernel/config/dlm/cluster/comms: opendir failed: 2
1412340044 /sys/kernel/config/dlm/cluster/spaces: opendir failed: 2
1412340044 totem/rrp_mode = 'none'
1412340044 set protocol 0
1412340044 group_mode 3 compat 0
1412340044 setup_cpg_daemon 13
1412340044 dlm:controld conf 2 1 0 memb 1189587136 1223141568 join 1223141568 left
1412340044 run protocol from nodeid 1189587136
1412340044 daemon run 1.1.1 max 1.1.1 kernel run 1.1.1 max 1.1.1
1412340044 plocks 15
1412340044 plock cpg message size: 104 bytes
1412340044 Processing membership 22676
1412340044 Adding address ip(192.168.231.70) to configfs for node 1189587136
1412340044 set_configfs_node 1189587136 192.168.231.70 local 0
1412340044 Added active node 1189587136: born-on=22628, last-seen=22676, this-event=22676, last-event=0
1412340044 Adding address ip(192.168.231.71) to configfs for node 1206364352
1412340044 set_configfs_node 1206364352 192.168.231.71 local 0
1412340044 Added active node 1206364352: born-on=22632, last-seen=22676, this-event=22676, last-event=0
1412340044 Adding address ip(192.168.231.72) to configfs for node 1223141568
1412340044 set_configfs_node 1223141568 192.168.231.72 local 1
1412340044 Added active node 1223141568: born-on=22636, last-seen=22676, this-event=22676, last-event=0
1412340044 dlm:controld conf 3 1 0 memb 1189587136 1206364352 1223141568 join 1206364352 left
1412340045 client connection 5 fd 16
1412340047 uevent: add@/kernel/dlm/clvmd
1412340047 kernel: add@ clvmd
1412340047 uevent: online@/kernel/dlm/clvmd
1412340047 kernel: online@ clvmd
1412340047 dlm:ls:clvmd conf 2 1 0 memb 1189587136 1223141568 join 1223141568 left
1412340047 clvmd add_change cg 1 joined nodeid 1223141568
1412340047 clvmd add_change cg 1 we joined
1412340047 clvmd add_change cg 1 counts member 2 joined 1 remove 0 failed 0
1412340047 clvmd check_fencing done
1412340047 clvmd check_quorum disabled
1412340047 clvmd check_fs none registered
1412340047 clvmd send_start cg 1 flags 1 data2 0 counts 0 2 1 0 0
1412340047 clvmd receive_start 1189587136:2 len 80
1412340047 clvmd match_change 1189587136:2 matches cg 1
1412340047 clvmd wait_messages cg 1 need 1 of 2
1412340047 clvmd receive_start 1223141568:1 len 80
1412340047 clvmd match_change 1223141568:1 matches cg 1
1412340047 clvmd wait_messages cg 1 got all 2
1412340047 clvmd start_kernel cg 1 member_count 2
1412340047 write "1090842362" to "/sys/kernel/dlm/clvmd/id"
1412340047 set_members mkdir "/sys/kernel/config/dlm/cluster/spaces/clvmd/nodes/1189587136"
1412340047 set_members mkdir "/sys/kernel/config/dlm/cluster/spaces/clvmd/nodes/1223141568"
1412340047 write "1" to "/sys/kernel/dlm/clvmd/control"
1412340047 write "0" to "/sys/kernel/dlm/clvmd/event_done"
1412340047 clvmd set_plock_ckpt_node from 0 to 1189587136
1412340047 clvmd receive_plocks_stored 1189587136:2 flags a sig 0 need_plocks 1
1412340047 clvmd match_change 1189587136:2 matches cg 1
1412340047 clvmd retrieve_plocks
1412340047 clvmd retrieve_plocks first 0 last 0 r_count 0 p_count 0 sig 0
1412340047 uevent: add@/devices/virtual/misc/dlm_clvmd
1412340047 dlm:ls:clvmd conf 3 1 0 memb 1189587136 1206364352 1223141568 join 1206364352 left
1412340047 clvmd add_change cg 2 joined nodeid 1206364352
1412340047 clvmd add_change cg 2 counts member 3 joined 1 remove 0 failed 0
1412340047 clvmd stop_kernel cg 2
1412340047 write "0" to "/sys/kernel/dlm/clvmd/control"
1412340047 clvmd check_fencing done
1412340047 clvmd check_quorum disabled
1412340047 clvmd check_fs none registered
1412340047 clvmd send_start cg 2 flags 2 data2 0 counts 1 3 1 0 0
1412340047 clvmd receive_start 1206364352:1 len 84
1412340047 clvmd match_change 1206364352:1 matches cg 2
1412340047 clvmd wait_messages cg 2 need 2 of 3
1412340047 clvmd receive_start 1189587136:3 len 84
1412340047 clvmd match_change 1189587136:3 matches cg 2
1412340047 clvmd wait_messages cg 2 need 1 of 3
1412340047 clvmd receive_start 1223141568:2 len 84
1412340047 clvmd match_change 1223141568:2 matches cg 2
1412340047 clvmd wait_messages cg 2 got all 3
1412340047 clvmd start_kernel cg 2 member_count 3
1412340047 dir_member 1223141568
1412340047 dir_member 1189587136
1412340047 set_members mkdir "/sys/kernel/config/dlm/cluster/spaces/clvmd/nodes/1206364352"
1412340047 write "1" to "/sys/kernel/dlm/clvmd/control"
1412340047 clvmd set_plock_ckpt_node from 1189587136 to 1189587136
1412340047 clvmd receive_plocks_stored 1189587136:3 flags a sig 0 need_plocks 0
1412340049 uevent: add@/kernel/dlm/CCB10CE8D4FF489B9A2ECB288DACF2D7
1412340049 kernel: add@ CCB10CE8D4FF489B9A2ECB288DACF2D7
1412340049 uevent: online@/kernel/dlm/CCB10CE8D4FF489B9A2ECB288DACF2D7
1412340049 kernel: online@ CCB10CE8D4FF489B9A2ECB288DACF2D7
1412340049 dlm:ls:CCB10CE8D4FF489B9A2ECB288DACF2D7 conf 2 1 0 memb 1189587136 1223141568 join 1223141568 left
1412340049 CCB10CE8D4FF489B9A2ECB288DACF2D7 add_change cg 1 joined nodeid 1223141568
1412340049 CCB10CE8D4FF489B9A2ECB288DACF2D7 add_change cg 1 we joined
1412340049 CCB10CE8D4FF489B9A2ECB288DACF2D7 add_change cg 1 counts member 2 joined 1 remove 0 failed 0
1412340049 CCB10CE8D4FF489B9A2ECB288DACF2D7 check_fencing done
1412340049 CCB10CE8D4FF489B9A2ECB288DACF2D7 check_quorum disabled
1412340049 CCB10CE8D4FF489B9A2ECB288DACF2D7 check_fs done
1412340049 CCB10CE8D4FF489B9A2ECB288DACF2D7 send_start cg 1 flags 1 data2 0 counts 0 2 1 0 0
1412340049 CCB10CE8D4FF489B9A2ECB288DACF2D7 receive_start 1223141568:1 len 80
1412340049 CCB10CE8D4FF489B9A2ECB288DACF2D7 match_change 1223141568:1 matches cg 1
1412340049 CCB10CE8D4FF489B9A2ECB288DACF2D7 wait_messages cg 1 need 1 of 2
1412340049 CCB10CE8D4FF489B9A2ECB288DACF2D7 receive_start 1189587136:2 len 80
1412340049 CCB10CE8D4FF489B9A2ECB288DACF2D7 match_change 1189587136:2 matches cg 1
1412340049 CCB10CE8D4FF489B9A2ECB288DACF2D7 wait_messages cg 1 got all 2
1412340049 CCB10CE8D4FF489B9A2ECB288DACF2D7 start_kernel cg 1 member_count 2
1412340049 write "153423433" to "/sys/kernel/dlm/CCB10CE8D4FF489B9A2ECB288DACF2D7/id"
1412340049 set_members mkdir "/sys/kernel/config/dlm/cluster/spaces/CCB10CE8D4FF489B9A2ECB288DACF2D7/nodes/1189587136"
1412340049 set_members mkdir "/sys/kernel/config/dlm/cluster/spaces/CCB10CE8D4FF489B9A2ECB288DACF2D7/nodes/1223141568"
1412340049 write "1" to "/sys/kernel/dlm/CCB10CE8D4FF489B9A2ECB288DACF2D7/control"
1412340049 write "0" to "/sys/kernel/dlm/CCB10CE8D4FF489B9A2ECB288DACF2D7/event_done"
1412340049 CCB10CE8D4FF489B9A2ECB288DACF2D7 set_plock_ckpt_node from 0 to 1189587136
1412340049 CCB10CE8D4FF489B9A2ECB288DACF2D7 receive_plocks_stored 1189587136:2 flags a sig 0 need_plocks 1
1412340049 CCB10CE8D4FF489B9A2ECB288DACF2D7 match_change 1189587136:2 matches cg 1
1412340049 CCB10CE8D4FF489B9A2ECB288DACF2D7 retrieve_plocks
1412340049 CCB10CE8D4FF489B9A2ECB288DACF2D7 retrieve_plocks first 0 last 0 r_count 0 p_count 0 sig 0
1412340049 dlm:ls:CCB10CE8D4FF489B9A2ECB288DACF2D7 conf 3 1 0 memb 1189587136 1206364352 1223141568 join 1206364352 left
1412340049 CCB10CE8D4FF489B9A2ECB288DACF2D7 add_change cg 2 joined nodeid 1206364352
1412340049 CCB10CE8D4FF489B9A2ECB288DACF2D7 add_change cg 2 counts member 3 joined 1 remove 0 failed 0
1412340049 CCB10CE8D4FF489B9A2ECB288DACF2D7 stop_kernel cg 2
1412340049 write "0" to "/sys/kernel/dlm/CCB10CE8D4FF489B9A2ECB288DACF2D7/control"
1412340049 CCB10CE8D4FF489B9A2ECB288DACF2D7 check_fencing done
1412340049 CCB10CE8D4FF489B9A2ECB288DACF2D7 check_quorum disabled
1412340049 CCB10CE8D4FF489B9A2ECB288DACF2D7 check_fs done
1412340049 CCB10CE8D4FF489B9A2ECB288DACF2D7 send_start cg 2 flags 2 data2 0 counts 1 3 1 0 0
1412340049 CCB10CE8D4FF489B9A2ECB288DACF2D7 receive_start 1206364352:1 len 84
1412340049 CCB10CE8D4FF489B9A2ECB288DACF2D7 match_change 1206364352:1 matches cg 2
1412340049 CCB10CE8D4FF489B9A2ECB288DACF2D7 wait_messages cg 2 need 2 of 3
1412340049 CCB10CE8D4FF489B9A2ECB288DACF2D7 receive_start 1189587136:3 len 84
1412340049 CCB10CE8D4FF489B9A2ECB288DACF2D7 match_change 1189587136:3 matches cg 2
1412340049 CCB10CE8D4FF489B9A2ECB288DACF2D7 wait_messages cg 2 need 1 of 3
1412340049 CCB10CE8D4FF489B9A2ECB288DACF2D7 receive_start 1223141568:2 len 84
1412340049 CCB10CE8D4FF489B9A2ECB288DACF2D7 match_change 1223141568:2 matches cg 2
1412340049 CCB10CE8D4FF489B9A2ECB288DACF2D7 wait_messages cg 2 got all 3
1412340049 CCB10CE8D4FF489B9A2ECB288DACF2D7 start_kernel cg 2 member_count 3
1412340049 dir_member 1223141568
1412340049 dir_member 1189587136
1412340049 set_members mkdir "/sys/kernel/config/dlm/cluster/spaces/CCB10CE8D4FF489B9A2ECB288DACF2D7/nodes/1206364352"
1412340049 write "1" to "/sys/kernel/dlm/CCB10CE8D4FF489B9A2ECB288DACF2D7/control"
1412340049 CCB10CE8D4FF489B9A2ECB288DACF2D7 set_plock_ckpt_node from 1189587136 to 1189587136
1412340049 CCB10CE8D4FF489B9A2ECB288DACF2D7 receive_plocks_stored 1189587136:3 flags a sig 0 need_plocks 0
1412340173 Processing membership 22680
1412340173 Adding address ip(192.168.231.68) to configfs for node 1156032704
1412340173 set_configfs_node 1156032704 192.168.231.68 local 0
1412340173 Added active node 1156032704: born-on=0, last-seen=22680, this-event=22680, last-event=22676
1412340173 Skipped active node 1189587136: born-on=22628, last-seen=22680, this-event=22680, last-event=22676
1412340173 Skipped active node 1206364352: born-on=22632, last-seen=22680, this-event=22680, last-event=22676
1412340173 Skipped active node 1223141568: born-on=22636, last-seen=22680, this-event=22680, last-event=22676
1412340294 Processing membership 22684
1412340294 Skipped active node 1156032704: born-on=22680, last-seen=22684, this-event=22684, last-event=22680
1412340294 Adding address ip(192.168.231.69) to configfs for node 1172809920
1412340294 set_configfs_node 1172809920 192.168.231.69 local 0
1412340294 Added active node 1172809920: born-on=0, last-seen=22684, this-event=22684, last-event=22680
1412340294 Skipped active node 1189587136: born-on=22628, last-seen=22684, this-event=22684, last-event=22680
1412340294 Skipped active node 1206364352: born-on=22632, last-seen=22684, this-event=22684, last-event=22680
1412340294 Skipped active node 1223141568: born-on=22636, last-seen=22684, this-event=22684, last-event=22680
1412340439 dlm:controld conf 4 1 0 memb 1172809920 1189587136 1206364352 1223141568 join 1172809920 left
1412340443 dlm:ls:clvmd conf 4 1 0 memb 1172809920 1189587136 1206364352 1223141568 join 1172809920 left
1412340443 clvmd add_change cg 3 joined nodeid 1172809920
1412340443 clvmd add_change cg 3 counts member 4 joined 1 remove 0 failed 0
1412340443 clvmd stop_kernel cg 3
1412340443 write "0" to "/sys/kernel/dlm/clvmd/control"
1412340443 clvmd check_fencing done
1412340443 clvmd check_quorum disabled
1412340443 clvmd check_fs none registered
1412340443 clvmd send_start cg 3 flags 2 data2 0 counts 2 4 1 0 0
1412340443 clvmd receive_start 1206364352:2 len 88
1412340443 clvmd match_change 1206364352:2 matches cg 3
1412340443 clvmd wait_messages cg 3 need 3 of 4
1412340443 clvmd receive_start 1223141568:3 len 88
1412340443 clvmd match_change 1223141568:3 matches cg 3
1412340443 clvmd wait_messages cg 3 need 2 of 4
1412340443 clvmd receive_start 1172809920:1 len 88
1412340443 clvmd match_change 1172809920:1 matches cg 3
1412340443 clvmd wait_messages cg 3 need 1 of 4
1412340443 clvmd receive_start 1189587136:4 len 88
1412340443 clvmd match_change 1189587136:4 matches cg 3
1412340443 clvmd wait_messages cg 3 got all 4
1412340443 clvmd start_kernel cg 3 member_count 4
1412340443 dir_member 1206364352
1412340443 dir_member 1223141568
1412340443 dir_member 1189587136
1412340443 set_members mkdir "/sys/kernel/config/dlm/cluster/spaces/clvmd/nodes/1172809920"
1412340443 write "1" to "/sys/kernel/dlm/clvmd/control"
1412340443 clvmd set_plock_ckpt_node from 1189587136 to 1189587136
1412340443 clvmd receive_plocks_stored 1189587136:4 flags a sig 0 need_plocks 0
1412340447 dlm:ls:clvmd conf 3 0 1 memb 1189587136 1206364352 1223141568 join left 1172809920
1412340447 clvmd add_change cg 4 remove nodeid 1172809920 reason 2
1412340447 clvmd add_change cg 4 counts member 3 joined 0 remove 1 failed 0
1412340447 clvmd stop_kernel cg 4
1412340447 write "0" to "/sys/kernel/dlm/clvmd/control"
1412340447 clvmd check_fencing done
1412340447 clvmd check_quorum disabled
1412340447 clvmd check_fs none registered
1412340447 clvmd send_start cg 4 flags 2 data2 0 counts 3 3 0 1 0
1412340447 clvmd receive_start 1223141568:4 len 84
1412340447 clvmd match_change 1223141568:4 matches cg 4
1412340447 clvmd wait_messages cg 4 need 2 of 3
1412340447 clvmd receive_start 1189587136:5 len 84
1412340447 clvmd match_change 1189587136:5 matches cg 4
1412340447 clvmd wait_messages cg 4 need 1 of 3
1412340447 clvmd receive_start 1206364352:3 len 84
1412340447 clvmd match_change 1206364352:3 matches cg 4
1412340447 clvmd wait_messages cg 4 got all 3
1412340447 clvmd start_kernel cg 4 member_count 3
1412340447 dir_member 1172809920
1412340447 dir_member 1206364352
1412340447 dir_member 1223141568
1412340447 dir_member 1189587136
1412340447 set_members rmdir "/sys/kernel/config/dlm/cluster/spaces/clvmd/nodes/1172809920"
1412340447 write "1" to "/sys/kernel/dlm/clvmd/control"
1412340447 clvmd set_plock_ckpt_node from 1189587136 to 1189587136
1412340447 clvmd receive_plocks_stored 1189587136:5 flags a sig 0 need_plocks 0
1412340448 dlm:controld conf 3 0 1 memb 1189587136 1206364352 1223141568 join left 1172809920
1412340448 dlm:controld conf 3 0 1 memb 1189587136 1206364352 1223141568 join left 1172809920
1412340507 Processing membership 22688
1412340507 Skipped active node 1156032704: born-on=22680, last-seen=22688, this-event=22688, last-event=22684
1412340507 del_configfs_node rmdir "/sys/kernel/config/dlm/cluster/comms/1172809920"
1412340507 Removed inactive node 1172809920: born-on=22684, last-seen=22684, this-event=22688, last-event=22684
1412340507 Skipped active node 1189587136: born-on=22628, last-seen=22688, this-event=22688, last-event=22684
1412340507 Skipped active node 1206364352: born-on=22632, last-seen=22688, this-event=22688, last-event=22684
1412340507 Skipped active node 1223141568: born-on=22636, last-seen=22688, this-event=22688, last-event=22684
1412340532 Processing membership 22692
1412340532 Skipped active node 1156032704: born-on=22680, last-seen=22692, this-event=22692, last-event=22688
1412340532 Adding address ip(192.168.231.69) to configfs for node 1172809920
1412340532 set_configfs_node 1172809920 192.168.231.69 local 0
1412340532 Added active node 1172809920: born-on=22684, last-seen=22692, this-event=22692, last-event=22688
1412340532 Skipped active node 1189587136: born-on=22628, last-seen=22692, this-event=22692, last-event=22688
1412340532 Skipped active node 1206364352: born-on=22632, last-seen=22692, this-event=22692, last-event=22688
1412340532 Skipped active node 1223141568: born-on=22636, last-seen=22692, this-event=22692, last-event=22688
1412340570 Processing membership 22696
1412340570 Skipped active node 1156032704: born-on=22680, last-seen=22696, this-event=22696, last-event=22692
1412340570 del_configfs_node rmdir "/sys/kernel/config/dlm/cluster/comms/1172809920"
1412340570 Removed inactive node 1172809920: born-on=22692, last-seen=22692, this-event=22696, last-event=22692
1412340570 Skipped active node 1189587136: born-on=22628, last-seen=22696, this-event=22696, last-event=22692
1412340570 Skipped active node 1206364352: born-on=22632, last-seen=22696, this-event=22696, last-event=22692
1412340570 Skipped active node 1223141568: born-on=22636, last-seen=22696, this-event=22696, last-event=22692
Is there any documentation on troubleshooting DLM/cLVM?

Regards.

-- 
Daniel Dehennin
Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF
Fingerprint: 3E69 014E 5C23 50E8 9ED6  2AAD CC1E 9E5B 7A6F E2DF

Attachment: signature.asc
Description: PGP signature

-- 
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

[Index of Archives]     [Corosync Cluster Engine]     [GFS]     [Linux Virtualization]     [Centos Virtualization]     [Centos]     [Linux RAID]     [Fedora Users]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite Camping]

  Powered by Linux