To be clear, the portion of the docs you site below is exactly why you need to be careful about how many votes you give to the qdiskd. It should be a tie breaker. You are using it to bring up a 3 node cluster in which only a single node exists. This is file in a testing environment, but is not recommended in a production setup. Once your other nodes are in place, you won't need the qdiskd. If you decide to keep it around, be very careful with it's use. It's really only meant for clusters in which you have an even number of actual nodes.
Sorry i don't have more time this morning to look at this but I am sure someone else will.
Take care
-C
On Thu, Aug 2, 2012 at 7:55 AM, Gianluca Cecchi <gianluca.cecchi@xxxxxxxxx> wrote:
On Thu, 2 Aug 2012 07:07:25 -0600 Corey Kovacs wrte:
> I might be reading this wrong but just in case, I thought I'd point this out.[snip]
>
> A single node can maintain quorum since 2+3>(9/2).Thanks for your input, Corey.
> In a split brain condition where a single node cannot talk to the other nodes, this could be disastrous.
As I said before, at this moment I'll have only one node on a site so
I'm also tweaking config to be able to work with one node alone
Anyway I refer to this sentence in manual, also for more than two
nodes configuration (example pertains to a 13 nodes cluster):
"
A cluster must maintain quorum to prevent split-brain issues. If
quorum was not enforced, quorum, a communication error on that same
thirteen-node cluster may cause a situation where six nodes are
operating on the shared storage, while another six nodes are also
operating on it, independently. Because of the communication error,
the two partial-clusters would overwrite areas of the disk and corrupt
the file system. With quorum rules enforced, only one of the partial
clusters can use the shared storage, thus protecting data integrity.
Quorum doesn't prevent split-brain situations, but it does decide who
is dominant and allowed to function in the cluster. Should split-brain
occur, quorum prevents more than one cluster group from doing
anything.
"
This said, in my case my problem is not with quorum, that is gained
when quorum disk becomes master, but with clvmd freezing without
showing any error
As suggested I set up logging for both cluster and lvm.
I also configured lvmetad
The diff between previous lvm.conf and current for further tests is this:
# diff -u lvm.conf lvm.conf.pre020812
--- lvm.conf 2012-08-02 14:48:31.172565731 +0200
+++ lvm.conf.pre020812 2012-08-02 01:33:55.878511113 +0200
@@ -232,8 +232,7 @@
# Controls the messages sent to stdout or stderr.
# There are three levels of verbosity, 3 being the most verbose.
- #verbose = 0
- verbose = 2
+ verbose = 0
# Should we send log messages through syslog?
# 1 is yes; 0 is no.
@@ -242,7 +241,6 @@
# Should we log error and debug messages to a file?
# By default there is no log file.
#file = "/var/log/lvm2.log"
- file = "/var/log/lvm2.log"
# Should we overwrite the log file each time the program is run?
# By default we append.
@@ -251,8 +249,7 @@
# What level of log messages should we send to the log file and/or syslog?
# There are 6 syslog-like log levels currently in use - 2 to 7 inclusive.
# 7 is the most verbose (LOG_DEBUG).
- #level = 0
- level = 4
+ level = 0
# Format of output messages
# Whether or not (1 or 0) to indent messages according to their severity
@@ -422,8 +419,7 @@
# Check whether CRC is matching when parsed VG is used multiple times.
# This is useful to catch unexpected internal cached volume group
# structure modification. Please only enable for debugging.
- #detect_internal_vg_cache_corruption = 0
- detect_internal_vg_cache_corruption = 1
+ detect_internal_vg_cache_corruption = 0
# If set to 1, no operations that change on-disk metadata will be
permitted.
# Additionally, read-only commands that encounter metadata in
need of repair
@@ -483,8 +479,7 @@
# libdevmapper. Useful for debugging problems with activation.
# Some of the checks may be expensive, so it's best to use this
# only when there seems to be a problem.
- #checks = 0
- checks = 1
+ checks = 0
# Set to 0 to disable udev synchronisation (if compiled into the binaries).
# Processes will not wait for notification from udev.
cluster.conf changes
# diff cluster.conf cluster.conf.51
2,6c2
< <cluster config_version="52" name="clrhev">
< <dlm log_debug="1" plock_debug="1"/>
< <logging>
< <logging_daemon name="qdiskd" debug="on"/>
< </logging>
---
> <cluster config_version="51" name="clrhev">
In attach I send two files:
lvm2.log with mark separating before and after issue of clvmd start command
clvmd start output.txt that is the output during "service clvmd start" command
to be able to do so, I started in signle user mode and then started
the services one at a time as in
/etc/rc.d/rc3.d/S*
but anticipating the ssh daemon, so that I'm able to login remotely
In fact after clvmd freezes I can only run a pair of sync commands and
power off....
If I'm not missing something stupid I can also post a bugzilla vs
Centos Bug tracker and then eventually someone will report upstream if
reproducible
Gianluca
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster
-- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster