Why you don't remove expected_votes=3 and let the cluster automatic calculate that
I told you be cause i had some many problems with that setting
2012/8/1 Gianluca Cecchi <gianluca.cecchi@xxxxxxxxx>
Hello,
testing a three node cluster + quorum disk and clvmd.
I was at CentOS 6.2 and I seem to remember to be able to start a
single node. Correct?
Then I upgraded to CentOS 6.3 and had a working environment.
My config has
<cman expected_votes="3" quorum_dev_poll="240000" two_node="0"/>
At the moment two nodes are in another site that is powered down and I
need to start a single node config.
When the node starts it gets waiting for quorum and when quorum disk
becomes master it goes ahead:
# cman_tool nodes
Node Sts Inc Joined Name
0 M 0 2012-08-01 15:41:58 /dev/block/253:4
1 X 0 intrarhev1
2 X 0 intrarhev2
3 M 1420 2012-08-01 15:39:58 intrarhev3
But the process hangs at clvmd start up. In particular at the step
vgchange -aly
Pid of "service clvmd start" command is 9335
# pstree -alp 9335
S24clvmd,9335 /etc/rc3.d/S24clvmd start
└─vgchange,9363 -ayl
# ll /proc/9363/fd/
total 0
lrwx------ 1 root root 64 Aug 1 15:44 0 -> /dev/console
lrwx------ 1 root root 64 Aug 1 15:44 1 -> /dev/console
lrwx------ 1 root root 64 Aug 1 15:44 2 -> /dev/console
lrwx------ 1 root root 64 Aug 1 15:44 3 -> /dev/mapper/control
lrwx------ 1 root root 64 Aug 1 15:44 4 -> socket:[1348167]
lr-x------ 1 root root 64 Aug 1 15:44 5 -> /dev/dm-3
# lsof -p 9363
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
vgchange 9363 root cwd DIR 104,3 4096 2 /
vgchange 9363 root rtd DIR 104,3 4096 2 /
vgchange 9363 root txt REG 104,3 971464 132238 /sbin/lvm
vgchange 9363 root mem REG 104,3 156872 210
/lib64/ld-2.12.so
vgchange 9363 root mem REG 104,3 1918016 569
/lib64/libc-2.12.so
vgchange 9363 root mem REG 104,3 22536 593
/lib64/libdl-2.12.so
vgchange 9363 root mem REG 104,3 24000 832
/lib64/libdevmapper-event.so.1.02
vgchange 9363 root mem REG 104,3 124624 750
/lib64/libselinux.so.1
vgchange 9363 root mem REG 104,3 272008 2060
/lib64/libreadline.so.6.0
vgchange 9363 root mem REG 104,3 138280 2469
/lib64/libtinfo.so.5.7
vgchange 9363 root mem REG 104,3 61648 1694
/lib64/libudev.so.0.5.1
vgchange 9363 root mem REG 104,3 251112 1489
/lib64/libsepol.so.1
vgchange 9363 root mem REG 104,3 229024 1726
/lib64/libdevmapper.so.1.02
vgchange 9363 root mem REG 253,7 99158576 17029
/usr/lib/locale/locale-archive
vgchange 9363 root mem REG 253,7 26060 134467
/usr/lib64/gconv/gconv-modules.cache
vgchange 9363 root 0u CHR 5,1 0t0 5218 /dev/console
vgchange 9363 root 1u CHR 5,1 0t0 5218 /dev/console
vgchange 9363 root 2u CHR 5,1 0t0 5218 /dev/console
vgchange 9363 root 3u CHR 10,58 0t0 5486
/dev/mapper/control
vgchange 9363 root 4u unix 0xffff880879b309c0 0t0 1348167 socket
vgchange 9363 root 5r BLK 253,3 0t143360 10773 /dev/dm-3
# strace -p 9363
Process 9363 attached - interrupt to quit
read(4,
multipath seems ok in general and for md=3 in particular
# multipath -l /dev/mapper/mpathd
mpathd (3600507630efe0b0c0000000000001181) dm-3 IBM,1750500
size=100G features='1 queue_if_no_path' hwhandler='0' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| |- 0:0:0:3 sdd 8:48 active undef running
| `- 1:0:0:3 sdl 8:176 active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
|- 0:0:1:3 sdq 65:0 active undef running
`- 1:0:1:3 sdy 65:128 active undef running
Currently I have
lvm2-2.02.95-10.el6.x86_64
lvm2-cluster-2.02.95-10.el6.x86_64
startup is stuck as in image attached
Logs
messages:
Aug 1 15:46:14 udevd[663]: worker [9379] unexpectedly returned with
status 0x0100
Aug 1 15:46:14 udevd[663]: worker [9379] failed while handling
'/devices/virtual/block/dm-15'
dmesg
DLM (built Jul 20 2012 01:56:50) installed
dlm: Using TCP for communications
qdiskd
Aug 01 15:41:58 qdiskd Score sufficient for master operation (1/1;
required=1); upgrading
Aug 01 15:43:03 qdiskd Assuming master role
corosync.log
Aug 01 15:41:58 corosync [CMAN ] quorum device registered
Aug 01 15:43:08 corosync [CMAN ] quorum regained, resuming activity
Aug 01 15:43:08 corosync [QUORUM] This node is within the primary
component and will provide service.
Aug 01 15:43:08 corosync [QUORUM] Members[1]: 3
fenced.log
Aug 01 15:43:09 fenced fenced 3.0.12.1 started
Aug 01 15:43:09 fenced failed to get dbus connection
dlm_controld.log
Aug 01 15:43:10 dlm_controld dlm_controld 3.0.12.1 started
gfs_controld.log
Aug 01 15:43:11 gfs_controld gfs_controld 3.0.12.1 started
Do I miss anything simple?
Is it correct to say that clvmd can start only when one node is
active, given that it has quorum under the cluster configuration rules
set up?
Or am I hitting any known bug/problem?
Thanks in advance,
Gianluca
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster
--
esta es mi vida e me la vivo hasta que dios quiera
-- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster