11.09.2015 17:02, Daniel Dehennin wrote:
On a two node cluster Ubuntu Trusty:
- Linux nebula3 3.13.0-63-generic #103-Ubuntu SMP Fri Aug 14 21:42:59
UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
- corosync 2.3.3-1ubuntu1
- pacemaker 1.1.10+git20130802-1ubuntu2.3
- dlm 4.0.1-0ubuntu1
- clvm 2.02.98-6ubuntu2
You need newer version of this^
2.02.102 is known to include commit 431eda6 without which cluster is
unusable in degraded state (and even if one node is put to standby state).
You see timeouts with two nodes online, so that is the different issue,
but that above will not hurt.
- gfs2-utils 3.1.6-0ubuntu1
The LVM commands take minutes to complete:
root@nebula3:~# time vgs
Error locking on node 40a8e784: Command timed out
Error locking on node 40a8e784: Command timed out
Error locking on node 40a8e784: Command timed out
VG #PV #LV #SN Attr VSize VFree
nebula3-vg 1 4 0 wz--n- 133,52g 0
one-fs 1 1 0 wz--nc 2,00t 0
one-production 1 0 0 wz--nc 1023,50g 1023,50g
real 5m40.233s
user 0m0.005s
sys 0m0.018s
Do you know where I can look to find what's going on?
Here are some informations:
root@nebula3:~# corosync-quorumtool
Quorum information
Date: Fri Sep 11 15:57:17 2015
Quorum provider: corosync_votequorum
Nodes: 2
Node ID: 1084811139
Ring ID: 1460
Quorate: Yes
Votequorum information
Expected votes: 2
Highest expected: 2
Total votes: 2
Quorum: 1
Flags: 2Node Quorate WaitForAll LastManStanding
Better use two_node: 1 in votequorum section.
That implies wait_for_all and supersedes last_man_standing for two-node
I'd also recommend to set clear_node_high_bit in totem section, do you
use it?
But even better is to add nodelist section to corosync.conf with
manually specified nodeid's.
Everything else looks fine...
Membership information
Nodeid Votes Name
1084811139 1 (local)
1084811140 1
root@nebula3:~# dlm_tool ls
dlm lockspaces
name datastores
id 0x1b61ba6a
flags 0x00000000
change member 2 joined 1 remove 0 failed 0 seq 1,1
members 1084811139 1084811140
name clvmd
id 0x4104eefa
flags 0x00000000
change member 2 joined 1 remove 0 failed 0 seq 1,1
members 1084811139 1084811140
root@nebula3:~# dlm_tool status
cluster nodeid 1084811139 quorate 1 ring seq 1460 1460
daemon now 11026 fence_pid 0
node 1084811139 M add 455 rem 0 fail 0 fence 0 at 0 0
node 1084811140 M add 455 rem 0 fail 0 fence 0 at 0 0
Linux-cluster mailing list