Re: clvmd problems with centos 6.3 or normal clvmd behaviour?

Corey Kovacs <corey.kovacs@xxxxxxxxx> · Thu, 2 Aug 2012 08:17:51 -0600

Yup, I missed the part where you said you only have a single node. 

To be clear, the portion of the docs you site below is exactly why you need to be careful about how many votes you give to the qdiskd. It should be a tie breaker. You are using it to bring up a 3 node cluster in which only a single node exists. This is file in a testing environment, but is not recommended in a production setup. Once your other nodes are in place, you won't need the qdiskd. If you decide to keep it around, be very careful with it's use. It's really only meant for clusters in which you have an even number of actual nodes.



Sorry i don't have more time this morning to look at this but I am sure someone else will.

Take care

-C

On Thu, Aug 2, 2012 at 7:55 AM, Gianluca Cecchi <gianluca.cecchi@xxxxxxxxx> wrote:

On Thu, 2 Aug 2012 07:07:25 -0600 Corey Kovacs wrte:

> I might be reading this wrong but just in case, I thought I'd point this out.

>

[snip]

> A single node can maintain quorum since 2+3>(9/2).

> In a split brain condition where a single node cannot talk to the other nodes, this could be disastrous.



Thanks for your input, Corey.

As I said before, at this moment I'll have only one node on a site so

I'm also tweaking config to be able to work with one node alone



Anyway I refer to this sentence in manual, also for more than two

nodes configuration (example pertains to a 13 nodes cluster):



"

A cluster must maintain quorum to prevent split-brain issues. If

quorum was not enforced, quorum, a communication error on that same

thirteen-node cluster may cause a situation where six nodes are

operating on the shared storage, while another six nodes are also

operating on it, independently. Because of the communication error,

the two partial-clusters would overwrite areas of the disk and corrupt

the file system. With quorum rules enforced, only one of the partial

clusters can use the shared storage, thus protecting data integrity.

Quorum doesn't prevent split-brain situations, but it does decide who

is dominant and allowed to function in the cluster. Should split-brain

occur, quorum prevents more than one cluster group from doing

anything.

"



This said, in my case my problem is not with quorum, that is gained

when quorum disk becomes master, but with clvmd freezing without

showing any error

As suggested I set up logging for both cluster and lvm.



I also configured lvmetad



The diff between previous lvm.conf and current for further tests is this:

# diff -u lvm.conf lvm.conf.pre020812

--- lvm.conf    2012-08-02 14:48:31.172565731 +0200

+++ lvm.conf.pre020812  2012-08-02 01:33:55.878511113 +0200

@@ -232,8 +232,7 @@



     # Controls the messages sent to stdout or stderr.

     # There are three levels of verbosity, 3 being the most verbose.

-    #verbose = 0

-    verbose = 2

+    verbose = 0



     # Should we send log messages through syslog?

     # 1 is yes; 0 is no.

@@ -242,7 +241,6 @@

     # Should we log error and debug messages to a file?

     # By default there is no log file.

     #file = "/var/log/lvm2.log"

-    file = "/var/log/lvm2.log"



     # Should we overwrite the log file each time the program is run?

     # By default we append.

@@ -251,8 +249,7 @@

     # What level of log messages should we send to the log file and/or syslog?

     # There are 6 syslog-like log levels currently in use - 2 to 7 inclusive.

     # 7 is the most verbose (LOG_DEBUG).

-    #level = 0

-    level = 4

+    level = 0



     # Format of output messages

     # Whether or not (1 or 0) to indent messages according to their severity

@@ -422,8 +419,7 @@

     # Check whether CRC is matching when parsed VG is used multiple times.

     # This is useful to catch unexpected internal cached volume group

     # structure modification. Please only enable for debugging.

-    #detect_internal_vg_cache_corruption = 0

-    detect_internal_vg_cache_corruption = 1

+    detect_internal_vg_cache_corruption = 0



     # If set to 1, no operations that change on-disk metadata will be

permitted.

     # Additionally, read-only commands that encounter metadata in

need of repair

@@ -483,8 +479,7 @@

     # libdevmapper.  Useful for debugging problems with activation.

     # Some of the checks may be expensive, so it's best to use this

     # only when there seems to be a problem.

-    #checks = 0

-    checks = 1

+    checks = 0



     # Set to 0 to disable udev synchronisation (if compiled into the binaries).

     # Processes will not wait for notification from udev.



cluster.conf changes

# diff cluster.conf cluster.conf.51

2,6c2

< <cluster config_version="52" name="clrhev">

<       <dlm log_debug="1" plock_debug="1"/>

<       <logging>

<               <logging_daemon name="qdiskd" debug="on"/>

<       </logging>

---

> <cluster config_version="51" name="clrhev">



In attach I send two files:

lvm2.log with mark separating before and after issue of clvmd start command

clvmd start output.txt that is the output during "service clvmd start" command



to be able to do so, I started in signle user mode and then started

the services one at a time as in



/etc/rc.d/rc3.d/S*



but anticipating the ssh daemon, so that I'm able to login remotely

In fact after clvmd freezes I can only run a pair of sync commands and

power off....



If I'm not missing something stupid I can also post a bugzilla vs

Centos Bug tracker and then eventually someone will report upstream if

reproducible



Gianluca


--

Linux-cluster mailing list

Linux-cluster@xxxxxxxxxx

https://www.redhat.com/mailman/listinfo/linux-cluster


--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster