I have been experimenting with clustering prior to setting up a datacentre system providing high availability storage with automatic failover in the case of individual disk failure or complete disk server failure. I am new to Linux clustering, and am not sure whether problems I am finding are my misuse or bugs. Intended final setup is two disk servers using h/w RAID6, each exporting a single large block device using GNBD, with clients accessing filesystems on the imported storage using CLVM/GFS. To provide extra redundancy I am hoping to configure CLVM to use mirroring to duplicate the logical volumes across the two servers. Systems are all CentOS-5. First problem was trying to use Conga. It seems that ricci doesn't work on CentOS-5. I tried the workaround (http://bugs.centos.org/view.php?id=1931) of replacing the CentOS version of /etc/redhat-release with the RHEL5 version, which did enable me to set up a two-node cluster, but failed when I tried to configure storage. So I am using a combination of manual configuration, system-config-cluster and system-config-lvm. Second problem was lack of startup scripts for GNBD. I have rolled my own, using a /etc/gndb.conf file to specify the exports and imports, which seems to work fine, but leaves me worried that GNBD may not be popular enough to be fully supported. Third problem is trying to struggle with correct startup of a two-node cluster. For initial testing I have one disk server and one client in the cluster. I accept that the quorum arrangements are difficult for a two-node cluster, but I was concerned to find that just rebooting one node with the other remaining up would not work reliably, often hanging permanently in shutdown (I think it was clurgmgrd hanging - which is odd as I have no cluster resources/services configured), and frequently hanging for 5 minutes in startup of CLVM on the disk server (where there are actually no logical volumes). This was solved by removing the two-node option in the cluster config, and giving the disk server a high vote. This means the client can never be quorate on its own, but that doesn't matter as its only use of clustering is to import the shared disks. If this is a sensible solution, I think it would be worth documenting somewhere. Fourth problem (and my main concern) is setting up mirroring. I am wondering whether this is actually possible in a clustering environment. The idea is that all filestore partitions will be mirrored over the two file servers, so if one of the servers fails completely, LVM will seamlessly switch to using the partitions unmirrored from the remaining server. It seems however that the LVM mirroring either needs three physical devices (the third for keeping the mirror logs) or runs using a corelog. If I use the former, I have to find another block device to export, which is another point of failure, and if I use core logging I don't see how the mirror log can be maintained cluster-wide. It does not seem possible to create a corelog mirror using system-config-lvm. I have tried making a disk log mirror, both with system-config-lvm and manually with lvcreate, with no luck. With lvcreate I get locking errors, due to LV UUID not being recognized - the UUID reported appears to be the concatenation of two UUIDs. Rebooting the client seems to clear this, but then I find that system-config-lvm crashes on startup, and if I try to manually make a gfs on the mirror it always reports the device is too small for the journals. When I try to make a mirror using system-config-lvm it fails leaving just the disk log LV made. I'd appreciate any help here - is what I'm trying possible, or is there a better way to achieve failover in the event of a complete disk server failure? Also, are any of my problems (excluding ricci) currently known bugs, and is it worth trying a build from cvs/svn, or waiting for CentOS-5 updates? I have deliberately omitted the gory details of the various problems, but I am happy to provide more detail on request. -- Cliff -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster