Re: clvmd problems with centos 6.3 or normal clvmd behaviour?

Corey Kovacs <corey.kovacs@xxxxxxxxx> · Thu, 2 Aug 2012 07:07:25 -0600

I might be reading this wrong but just in case, I thought I'd point this out.

Your quorum config.

node=2 votes*3 (nodes have 6 votes total)
qdisk=3 votes.

A single node can maintain quorum since 2+3>(9/2).

In a split brain condition where a single node cannot talk to the other nodes, this could be disastrous.

Now, that all said, qdiskd using a volume as yours appears to be, won't be able to start until the cluster is quorate. 

Also, you might be running into a chicken and egg situation. Is your qdisk volume marked clustered? I believe once you set the locking type to 3, all LVM activity requires clvmd to be running. If it's not marked as clustered, then that's not going work I don't thinks since qdisk requires concurrent access across nodes. And you have to wait for clvmd.

It's unclear why you actually need a qdisk. If it's to keep the cluster up in a single node mode, then I'd make the qdisk member start up in a minority vote=1 and only change that in a controlled situation where you are sure the other nodes are shutdown completely. Remember, the purpose of quorum is to ensure that a majority rules and your config violates that premise. Just sayin' :)

Does your cluster run without qdiskd configured?

Anyway, I hope this helps at least a little. If I am way off base, I apologize and will crawl back into my cave :)

Good luck

Corey

On Thu, Aug 2, 2012 at 4:50 AM, emmanuel segura <emi2fast@xxxxxxxxx> wrote:

if you think the problem it's in lvm, put it in the debug man lvm.conf

2012/8/2 Gianluca Cecchi <gianluca.cecchi@xxxxxxxxx>

On Wed, Aug 1, 2012 at 6:15 PM, Gianluca Cecchi wrote:

> On Wed, 1 Aug 2012 16:26:38 +0200 emmanuel segura wrote:

>> Why you don't remove expected_votes=3 and let the cluster automatic calculate that

>

> Thanks for your answer Emmanuel, but cman starts correctly, while the

> problem seems related to

> vgchange -aly

> command hanging.

> But I tried that option too and the cluster hangs at the same point as before.

Further testing shows that cluster is indeed quorated and problems are

related to lvm...

I also tried following a more used and clean configuration seen in

examples for 3 nodes + quorum daemon:

2 votes for each node

<clusternode name="nodeX" nodeid="X" votes="2">

3 votes for quorum disk

<quorumd device="/dev/mapper/mpathquorum" interval="5"

label="clrhevquorum" tko="24" votes="3">

with and without expected_votes="9" in <cman ... /> part

One node + its quorum only config should be ok (2+3 = 5 votes)

After cman starts and quorumd is not master yet:

# cman_tool status

Version: 6.2.0

Config Version: 51

Cluster Name: clrhev

Cluster Id: 43203

Cluster Member: Yes

Cluster Generation: 1428

Membership state: Cluster-Member

Nodes: 1

Expected votes: 9

Total votes: 2

Node votes: 2

Quorum: 5 Activity blocked

Active subsystems: 4

Flags:

Ports Bound: 0 178

Node name: intrarhev3

Node ID: 3

Multicast addresses: 239.192.168.108

Node addresses: 192.168.16.30

Then

# cman_tool status

Version: 6.2.0

Config Version: 51

Cluster Name: clrhev

Cluster Id: 43203

Cluster Member: Yes

Cluster Generation: 1428

Membership state: Cluster-Member

Nodes: 1

Expected votes: 9

Quorum device votes: 3

Total votes: 5

Node votes: 2

Quorum: 5

Active subsystems: 4

Flags:

Ports Bound: 0 178

Node name: intrarhev3

Node ID: 3

Multicast addresses: 239.192.168.108

Node addresses: 192.168.16.30

And startup continues up to clvmd step

In this phase, while clvmd startup hanges forever I have:

# dlm_tool ls

dlm lockspaces

name          clvmd

id            0x4104eefa

flags         0x00000000

change        member 1 joined 1 remove 0 failed 0 seq 1,1

members       3

# ps -ef|grep lv

root      3573  2593  0 01:05 ?        00:00:00 /bin/bash

/etc/rc3.d/S24clvmd start

root      3578     1  0 01:05 ?        00:00:00 clvmd -T30

root      3620     1  0 01:05 ?        00:00:00 /sbin/lvm pvscan

--cache --major 253 --minor 13

root      3804  3322  0 01:09 pts/0    00:00:00 grep lv

# ps -ef|grep vg

root      3601  3573  0 01:05 ?        00:00:00 /sbin/vgchange -ayl

root      3808  3322  0 01:09 pts/0    00:00:00 grep vg

# ps -ef|grep lv

root      3573  2593  0 01:05 ?        00:00:00 /bin/bash

/etc/rc3.d/S24clvmd start

root      3578     1  0 01:05 ?        00:00:00 clvmd -T30

root      4008  3322  0 01:13 pts/0    00:00:00 grep lv

# ps -ef|grep 3578

root      3578     1  0 01:05 ?        00:00:00 clvmd -T30

root      4017  3322  0 01:13 pts/0    00:00:00 grep 3578

It remains at

# service clvmd start

Starting clvmd:

Activating VG(s):   3 logical volume(s) in volume group "VG_VIRT02" now active

Is there any way to debug clvmd?

I suppose it communicates through intracluster, correct?

tcpdump output could be of any help?

Any one already passed to 6.3 (on rhel and/or CentOS) and having all

ok with clvmd?

BTW: I also tried lvmetad, that is tech preview in 6.3, enabling its

service and putting "use_lvmetad = 1" in lvm.conf but without luck...

Thanks in advance

--

Linux-cluster mailing list

Linux-cluster@xxxxxxxxxx

https://www.redhat.com/mailman/listinfo/linux-cluster

-- 
esta es mi vida e me la vivo hasta que dios quiera

--

Linux-cluster mailing list

Linux-cluster@xxxxxxxxxx

https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster