RHEL5 clvmd hangs only after a node crashes...

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Title: RHEL5 clvmd hangs only after a node crashes...

Hi everybody,

        I have successfully installed and (almost successfully) configured RHEL5 cluster suite on a two node cluster, which will soon become a three node cluster, hopefully - my boss' euros willing-: that's why I configured it with qdisk (on a raw partition) too.
        Two GFS (v. 1) filesystems are shared by both nodes.
       
        Well, everything really works like a breeze (HP iLO power fencing obviously included), even when I reboot _NORMALLY_ any node.
       
        Problems arise only after a node _CRASHES_ (You see, I love performing cold reboots in test environments), when that's exactly what happens:
       
        1) Node 2 crashes;
        2) Node 1 successfully fences node 2: I can go on working on GFS file systems after a freeze lasting less than one second;
        3) While booting up, node 2 startup sequence runs fine: cman services start successfully (I even get 'Starting fencing... [OK]' !!!), but when it comes to clvmd, music definitely changes: dlm connections are successfully established, but then the whole node hangs on 'Starting clvmd... '. Debugging clvmd init script, I've found that the problem is due to the vgscan command, which hangs indefinitely on something like 'Locking vg_flash_1... '. I can't really find any particular error about that in my logs.
        4) Activity on GFS on node 1 goes on normally.
        5) The one and only way I've found to recover my whole cluster from this situation is to restart both nodes toghether, but this is a nonsense in a cluster architecture...
       
        Hoping I simply made some trivial mistake,
        I would greatly appreciate any help or pointers from you, gurus, to solve my problem.
        I thank anyone in advance, who can make me -and my boss too- happy again with my cluster, ...indeed!
       
        Tyzan
       
       


______________________________________________________________________________________________________________
Linux xxxxxxxxxxxxxxxx 2.6.18-8.1.8.el5 #1 SMP Tue Jul 10 06:39:17 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux

lvm2-cluster-2.02.16-3.el5
kmod-gfs-0.1.16-5.2.6.18_8.1.8.el5
gfs2-utils-0.1.25-1.el5
gfs-utils-0.1.11-3.el5
cman-2.0.64-1.0.1.el5
rgmanager-2.0.24-1.el5

# cat /etc/lvm/lvm.conf

# This is an example configuration file for the LVM2 system.
# It contains the default settings that would be used if there was no
# /etc/lvm/lvm.conf file.
#
# Refer to 'man lvm.conf' for further information including the file layout.
#
# To put this file in a different directory and override /etc/lvm set
# the environment variable LVM_SYSTEM_DIR before running the tools.


# This section allows you to configure which block devices should
# be used by the LVM system.
devices {

    # Where do you want your volume groups to appear ?
    dir = "/dev"

    # An array of directories that contain the device nodes you wish
    # to use with LVM2.
    scan = [ "/dev" ]

    # A filter that tells LVM2 to only use a restricted set of devices.
    # The filter consists of an array of regular expressions.  These
    # expressions can be delimited by a character of your choice, and
    # prefixed with either an 'a' (for accept) or 'r' (for reject).
    # The first _expression_ found to match a device name determines if
    # the device will be accepted or rejected (ignored).  Devices that
    # don't match any patterns are accepted.

    # Be careful if there there are symbolic links or multiple filesystem
    # entries for the same device as each name is checked separately against
    # the list of patterns.  The effect is that if any name matches any 'a'
    # pattern, the device is accepted; otherwise if any name matches any 'r'
    # pattern it is rejected; otherwise it is accepted.

    # Don't have more than one filter line active at once: only one gets used.

    # Run vgscan after you change this parameter to ensure that
    # the cache file gets regenerated (see below).
    # If it doesn't do what you expect, check the output of 'vgscan -vvvv'.


    # By default we accept every block device:
    filter = [ "a/.*/" ]

    # Exclude the cdrom drive
    # filter = [ "r|/dev/cdrom|" ]

    # When testing I like to work with just loopback devices:
    # filter = [ "a/loop/", "r/.*/" ]

    # Or maybe all loops and ide drives except hdc:
    # filter =[ "a|loop|", "r|/dev/hdc|", "a|/dev/ide|", "r|.*|" ]

    # Use anchors if you want to be really specific
    # filter = [ "a|^/dev/hda8$|", "r/.*/" ]

    # The results of the filtering are cached on disk to avoid
    # rescanning dud devices (which can take a very long time).  By
    # default this cache file is hidden in the /etc/lvm directory.
    # It is safe to delete this file: the tools regenerate it.
    cache = "/etc/lvm/.cache"

    # You can turn off writing this cache file by setting this to 0.
    write_cache_state = 1

    # Advanced settings.

    # List of pairs of additional acceptable block device types found
    # in /proc/devices with maximum (non-zero) number of partitions.
    # types = [ "fd", 16 ]

    # If sysfs is mounted (2.6 kernels) restrict device scanning to
    # the block devices it believes are valid.
    # 1 enables; 0 disables.
    sysfs_scan = 1     

    # By default, LVM2 will ignore devices used as components of
    # software RAID (md) devices by looking for md superblocks.
    # 1 enables; 0 disables.
    md_component_detection = 1
}

# This section that allows you to configure the nature of the
# information that LVM2 reports.
log {

    # Controls the messages sent to stdout or stderr.
    # There are three levels of verbosity, 3 being the most verbose.
    verbose = 0

    # Should we send log messages through syslog?
    # 1 is yes; 0 is no.
    syslog = 1

    # Should we log error and debug messages to a file?
    # By default there is no log file.
    #file = "/var/log/lvm2.log"

    # Should we overwrite the log file each time the program is run?
    # By default we append.
    overwrite = 0

    # What level of log messages should we send to the log file and/or syslog?
    # There are 6 syslog-like log levels currently in use - 2 to 7 inclusive.
    # 7 is the most verbose (LOG_DEBUG).
    level = 0
   
    # Format of output messages
    # Whether or not (1 or 0) to indent messages according to their severity
    indent = 1

    # Whether or not (1 or 0) to display the command name on each line output
    command_names = 0

    # A prefix to use before the message text (but after the command name,
    # if selected).  Default is two spaces, so you can see/grep the severity
    # of each message.
    prefix = "  "

    # To make the messages look similar to the original LVM tools use:
    #   indent = 0
    #   command_names = 1
    #   prefix = " -- "

    # Set this if you want log messages during activation.
    # Don't use this in low memory situations (can deadlock).
    # activation = 0
}

# Configuration of metadata backups and archiving.  In LVM2 when we
# talk about a 'backup' we mean making a copy of the metadata for the
# *current* system.  The 'archive' contains old metadata configurations.
# Backups are stored in a human readeable text format.
backup {

    # Should we maintain a backup of the current metadata configuration ?
    # Use 1 for Yes; 0 for No.
    # Think very hard before turning this off!
    backup = 1

    # Where shall we keep it ?
    # Remember to back up this directory regularly!
    backup_dir = "/etc/lvm/backup"

    # Should we maintain an archive of old metadata configurations.
    # Use 1 for Yes; 0 for No.
    # On by default.  Think very hard before turning this off.
    archive = 1

    # Where should archived files go ?
    # Remember to back up this directory regularly!
    archive_dir = "/etc/lvm/archive"
   
    # What is the minimum number of archive files you wish to keep ?
    retain_min = 10

    # What is the minimum time you wish to keep an archive file for ?
    retain_days = 30
}

# Settings for the running LVM2 in shell (readline) mode.
shell {

    # Number of lines of history to store in ~/.lvm_history
    history_size = 100
}


# Miscellaneous global LVM2 settings
global {
    library_dir = "/usr/lib64"
   
    # The file creation mask for any files and directories created.
    # Interpreted as octal if the first digit is zero.
    umask = 077

    # Allow other users to read the files
    #umask = 022

    # Enabling test mode means that no changes to the on disk metadata
    # will be made.  Equivalent to having the -t option on every
    # command.  Defaults to off.
    test = 0

    # Whether or not to communicate with the kernel device-mapper.
    # Set to 0 if you want to use the tools to manipulate LVM metadata
    # without activating any logical volumes.
    # If the device-mapper kernel driver is not present in your kernel
    # setting this to 0 should suppress the error messages.
    activation = 1

    # If we can't communicate with device-mapper, should we try running
    # the LVM1 tools?
    # This option only applies to 2.4 kernels and is provided to help you
    # switch between device-mapper kernels and LVM1 kernels.
    # The LVM1 tools need to be installed with .lvm1 suffices
    # e.g. vgscan.lvm1 and they will stop working after you start using
    # the new lvm2 on-disk metadata format.
    # The default value is set when the tools are built.
    # fallback_to_lvm1 = 0

    # The default metadata format that commands should use - "lvm1" or "lvm2".
    # The command line override is -M1 or -M2.
    # Defaults to "lvm1" if compiled in, else "lvm2".
    # format = "lvm1"

    # Location of proc filesystem
    proc = "/proc"

    # Type of locking to use. Defaults to local file-based locking (1).
    # Turn locking off by setting to 0 (dangerous: risks metadata corruption
    # if LVM2 commands get run concurrently).
    # Type 2 uses the external shared library locking_library.
    # Type 3 uses built-in clustered locking.
    locking_type = 3

    # If using external locking (type 2) and initialisation fails,
    # with this set to 1 an attempt will be made to use the built-in
    # clustered locking.
    # If you are using a customised locking_library you should set this to 0.
    fallback_to_clustered_locking = 1

    # If an attempt to initialise type 2 or type 3 locking failed, perhaps
    # because cluster components such as clvmd are not running, with this set
    # to 1 an attempt will be made to use local file-based locking (type 1).
    # If this succeeds, only commands against local volume groups will proceed.
    # Volume Groups marked as clustered will be ignored.
    fallback_to_local_locking = 1

    # Local non-LV directory that holds file-based locks while commands are
    # in progress.  A directory like /tmp that may get wiped on reboot is OK.
    locking_dir = "/var/lock/lvm"

    # Other entries can go here to allow you to load shared libraries
    # e.g. if support for LVM1 metadata was compiled as a shared library use
    #   format_libraries = "liblvm2format1.so"
    # Full pathnames can be given.

    # Search this directory first for shared libraries.
    #   library_dir = "/lib"

    # The external locking library to load if locking_type is set to 2.
    #   locking_library = "liblvm2clusterlock.so"
}

activation {
    # Device used in place of missing stripes if activating incomplete volume.
    # For now, you need to set this up yourself first (e.g. with 'dmsetup')
    # For example, you could make it return I/O errors using the 'error'
    # target or make it return zeros.
    missing_stripe_filler = "/dev/ioerror"

    # How much stack (in KB) to reserve for use while devices suspended
    reserved_stack = 256

    # How much memory (in KB) to reserve for use while devices suspended
    reserved_memory = 8192

    # Nice value used while devices suspended
    process_priority = -18

    # If volume_list is defined, each LV is only activated if there is a
    # match against the list.
    #   "vgname" and "vgname/lvname" are matched exactly.
    #   "@tag" matches any tag set in the LV or VG.
    #   "@*" matches if any tag defined on the host is also set in the LV or VG
    #
    # volume_list = [ "vg1", "vg2/lvol1", "@tag1", "@*" ]

    # Size (in KB) of each copy operation when mirroring
    mirror_region_size = 512

    # 'mirror_image_fault_policy' and 'mirror_log_fault_policy' define
    # how a device failure affecting a mirror is handled.
    # A mirror is composed of mirror images (copies) and a log.
    # A disk log ensures that a mirror does not need to be re-synced
    # (all copies made the same) every time a machine reboots or crashes.
    #
    # In the event of a failure, the specified policy will be used to
    # determine what happens:
    #
    # "remove" - Simply remove the faulty device and run without it.  If
    #            the log device fails, the mirror would convert to using
    #            an in-memory log.  This means the mirror will not
    #            remember its sync status across crashes/reboots and
    #            the entire mirror will be re-synced.  If a
    #            mirror image fails, the mirror will convert to a
    #            non-mirrored device if there is only one remaining good
    #            copy.
    #
    # "allocate" - Remove the faulty device and try to allocate space on
    #            a new device to be a replacement for the failed device.
    #            Using this policy for the log is fast and maintains the
    #            ability to remember sync state through crashes/reboots.
    #            Using this policy for a mirror device is slow, as it
    #            requires the mirror to resynchronize the devices, but it
    #            will preserve the mirror characteristic of the device.
    #            This policy acts like "remove" if no suitable device and
    #            space can be allocated for the replacement.
    #            Currently this is not implemented properly and behaves
    #            similarly to:
    #
    # "allocate_anywhere" - Operates like "allocate", but it does not
    #            require that the new space being allocated be on a
    #            device is not part of the mirror.  For a log device
    #            failure, this could mean that the log is allocated on
    #            the same device as a mirror device.  For a mirror
    #            device, this could mean that the mirror device is
    #            allocated on the same device as another mirror device.
    #            This policy would not be wise for mirror devices
    #            because it would break the redundant nature of the
    #            mirror.  This policy acts like "remove" if no suitable
    #            device and space can be allocated for the replacement.

    mirror_log_fault_policy = "allocate"
    mirror_device_fault_policy = "remove"
}


####################
# Advanced section #
####################

# Metadata settings
#
# metadata {
    # Default number of copies of metadata to hold on each PV.  0, 1 or 2.
    # You might want to override it from the command line with 0
    # when running pvcreate on new PVs which are to be added to large VGs.

    # pvmetadatacopies = 1

    # Approximate default size of on-disk metadata areas in sectors.
    # You should increase this if you have large volume groups or
    # you want to retain a large on-disk history of your metadata changes.

    # pvmetadatasize = 255

    # List of directories holding live copies of text format metadata.
    # These directories must not be on logical volumes!
    # It's possible to use LVM2 with a couple of directories here,
    # preferably on different (non-LV) filesystems, and with no other
    # on-disk metadata (pvmetadatacopies = 0). Or this can be in
    # addition to on-disk metadata areas.
    # The feature was originally added to simplify testing and is not
    # supported under low memory situations - the machine could lock up.
    #
    # Never edit any files in these directories by hand unless you
    # you are absolutely sure you know what you are doing! Use
    # the supplied toolset to make changes (e.g. vgcfgrestore).

    # dirs = [ "/etc/lvm/metadata", "/mnt/disk2/lvm/metadata2" ]
#}

# Event daemon
#
# dmeventd {
    # mirror_library is the library used when monitoring a mirror device.
    #
    # "libdevmapper-event-lvm2mirror.so" attempts to recover from failures.
    # It removes failed devices from a volume group and reconfigures a
    # mirror as necessary.
    #
    # mirror_library = "libdevmapper-event-lvm2mirror.so"
#}



#pvdisplay

  --- Physical volume ---
  PV Name               /dev/sdd1
  VG Name               vg_flash
  PV Size               200.00 GB / not usable 1.34 MB
  Allocatable           yes (but full)
  PE Size (KByte)       4096
  Total PE              51199
  Free PE               0
  Allocated PE          51199
  PV UUID               EMy26w-jUZE-JaGV-EOW5-8vGo-Y3Zx-Ifv3ov

  --- Physical volume ---
  PV Name               /dev/sdc1
  VG Name               vg_share
  PV Size               200.00 GB / not usable 1.34 MB
  Allocatable           yes (but full)
  PE Size (KByte)       4096
  Total PE              51199
  Free PE               0
  Allocated PE          51199
  PV UUID               J1kTxh-qWD5-v9B6-Am1v-P3mq-AkSq-lGzrlN

#vgdisplay

  --- Volume group ---
  VG Name               vg_flash
  System ID
  Format                lvm2
  Metadata Areas        1
  Metadata Sequence No  2
  VG Access             read/write
  VG Status             resizable
  Clustered             yes
  Shared                no
  MAX LV                0
  Cur LV                1
  Open LV               1
  Max PV                0
  Cur PV                1
  Act PV                1
  VG Size               200.00 GB
  PE Size               4.00 MB
  Total PE              51199
  Alloc PE / Size       51199 / 200.00 GB
  Free  PE / Size       0 / 0
  VG UUID               g79A0p-0g5V-sXyY-kXsY-rWzt-6G7D-BEktoM

  --- Volume group ---
  VG Name               vg_share
  System ID
  Format                lvm2
  Metadata Areas        1
  Metadata Sequence No  2
  VG Access             read/write
  VG Status             resizable
  Clustered             yes
  Shared                no
  MAX LV                0
  Cur LV                1
  Open LV               1
  Max PV                0
  Cur PV                1
  Act PV                1
  VG Size               200.00 GB
  PE Size               4.00 MB
  Total PE              51199
  Alloc PE / Size       51199 / 200.00 GB
  Free  PE / Size       0 / 0
  VG UUID               g9fIcp-seMM-30O0-sz1r-ttMr-NkeF-KVRF5z

# lvdisplay

  --- Logical volume ---
  LV Name                /dev/vg_flash/lv_flash_1
  VG Name                vg_flash
  LV UUID                rYCXFB-7Dxw-qWCi-G8fc-uF1t-coP1-cD1VLp
  LV Write Access        read/write
  LV Status              available
  # open                 1
  LV Size                200.00 GB
  Current LE             51199
  Segments               1
  Allocation             inherit
  Read ahead sectors     0
  Block device           253:0

  --- Logical volume ---
  LV Name                /dev/vg_share/lv_share_1
  VG Name                vg_share
  LV UUID                LvEZ3t-aYda-VU2i-rlYV-oXj6-nQWq-c5Fq8M
  LV Write Access        read/write
  LV Status              available
  # open                 1
  LV Size                200.00 GB
  Current LE             51199
  Segments               1
  Allocation             inherit
  Read ahead sectors     0
  Block device           253:1

# cat /etc/fstab

LABEL=/                 /                       ext3    defaults        1 1
LABEL=/boot             /boot                   ext3    defaults        1 2
devpts                  /dev/pts                devpts  gid=5,mode=620  0 0
tmpfs                   /dev/shm                tmpfs   defaults        0 0
proc                    /proc                   proc    defaults        0 0
sysfs                   /sys                    sysfs   defaults        0 0
LABEL=SW-cciss/c0d0p2   swap                    swap    defaults        0 0
/dev/vg_share/lv_share_1        /share          gfs     defaults        1 2
/dev/vg_flash/lv_flash_1        /u02            gfs     defaults        1 2

# df -h

Filesystem            Size  Used Avail Use% Mounted on
/dev/cciss/c0d0p3      29G   13G   16G  45% /
/dev/cciss/c0d0p1      99M   15M   80M  16% /boot
tmpfs                 2.0G     0  2.0G   0% /dev/shm
/dev/vg_share/lv_share_1
                      199G  100K  199G   1% /share
/dev/vg_flash/lv_flash_1
                      199G  8.1G  191G   5% /u02

# cat /etc/cluster/cluster.conf

<cluster alias="orarac" config_version="25" name="orarac">
        <fence_daemon clean_start="1" post_fail_delay="0" post_join_delay="20"/>
        <clusternodes>
                <clusternode name="racnode1" nodeid="1" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="iLO01"/>
                                </method>
                                <method name="2">
                                        <device name="Operator1"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="racnode2" nodeid="2" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="iLO09"/>
                                </method>
                                <method name="2">
                                        <device name="Operator2"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
        <cman expected_votes="3"/>
        <fencedevices>
                <fencedevice agent="fence_ilo" hostname="192.168.14.11" login="Administrator" name="iLO01" passwd="********"/>
                <fencedevice agent="fence_ilo" hostname="192.168.14.19" login="Administrator" name="iLO09" passwd="********"/>
                <fencedevice agent="fence_manual" name="Operator1"/>
                <fencedevice agent="fence_manual" name="Operator2"/>
        </fencedevices>
        <quorumd device="/dev/sda1" interval="1" min_score="1" tko="10" votes="1">
                <heuristic interval="10" program="ping -t1 -c1 192.168.14.10" score="1"/>
        </quorumd>
</cluster>


# clustat

Member Status: Quorate

  Member Name                        ID   Status
  ------ ----                        ---- ------
  racnode1                              1 Online, Local
  racnode2                              2 Online
  /dev/sda1                             0 Online, Quorum Disk

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

[Index of Archives]     [Corosync Cluster Engine]     [GFS]     [Linux Virtualization]     [Centos Virtualization]     [Centos]     [Linux RAID]     [Fedora Users]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite Camping]

  Powered by Linux