Patton, Matthew F, CTR, OSD-PA&E wrote:
I can't think of a way to combine (C)LVM, GFS, GNBD, and MD (software
RAID) and make it work unless just one of the nodes becomes the MD
master and then just exports it via NFS. Can it be done? Do commercial
options exist to pull off this trick?
Hi,
We're working on the same problem. We have tried two approaches, both
with their own fairly serious drawbacks.
Our goal was a 2-node all-in-one HA mega server, providing all office
services from one cluster, and with no single point of failure.
The first uses a raid master for each pair. Each member of the pair
exports a disk using GNBD. The pair negotiate a master using CMAN, and
that master assembles a RAID device using one GNBD import, plus one
local disk, and then exports it using NFS, or in the case of GFS being
used, exports the assembled raid device via a third GNBD export.
Our trick here was each node exported it's contributory disk, using
GNDB, by default, so long as at east one other node was active (quorum >
1), knowing only one master would ever be active. This significantly
reduced complexity.
Problems are:
- GNDB instabilities cause frequent locks and crashes, especially
busying DLM (suspected).
- NFS export scheme also causes locks and hangs to NFS clients on
failover *IF* a member of the pair then subsequently imports and an NFS
client, as needed in some of our mega-server ideas.
- NFS export is not too useful when file locking is important, e.g.
subversion, procmail etc (yes, procmail, if your mail server is also
your Samba homes server). You have to dell mailproc to use alternative
mailbox locking else mailboxes get corrupted.
- GFS on assembled device with GNDB export scheme works best, but
still causes locks and hangs. Note also an exporting client must NOT
import it's own exported GNBD volume, so there is no symmetry between
the pair, and it's quite difficult to manage.
Our second approach is something we've just embarked on, and so far is
proving more successful, using DRBD. DRBD is used to create a mirrored
pair of volumes, a bit like GNBD+MD as above.
The result is a block device accessible from both machines, but the
problem is that only one member of the pair is writable (master), and
the other is a read-only mount.
If the master server dies, the remaining DRBD becomes the master, and
becomes writable. When the dead node recovers, the recovered node
becomes a slave, read-only.
The problem is with the read-only aspect, so you still need to have an
exporting mechanism for the assembled DRBD volume running on the DRBD
master. We plan to do this via GNBD export (GFS FS installed).
That's where the complexity comes in - as the DRBD negotiation appears
to be totally independent of cluster control suite, and so we're having
to use customizations to start the exporting daemon on the DRBD master.
Conclusions
---
From all we've learned to date, it still seems a dedicated file server
or SAN approach is necessary to maintain availability.
Either of the above schemes would work fairly well if we were just
building a HA storage component, because most of the complexities we've
encountered come about when the shared storage device is used by
services on the same cluster nodes.
Most, if not all of what we've done so far is not suitable for a
production environment, as it just increases the coupling between nodes,
and therefore increases the chance of a cascade failure of the cluster.
In all seriousness I believe a single machine with RAID-1 pair has a
higher MTBF than any of our experiments.
Many parts of the CCS/GFS suite so far released have serious issues when
used in non-standard configurations. For example, exception handling
we've encountered usually defaults to "while (1) { retry(); sleep(1); }"
I've read last year about plans for GFS mirroring from RedHat, and
haven't found much else since. If anyone knows more I'd love to hear.
It also appears that the guys behind DRBD want to further develop their
mirroring so that both volumes can be writable, in which case you can
just stick GFS on the assembled device, and run whichever exporting
method you like as a normal cluster service.
James
www.daltonfirth.co.uk
--
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster