On Wed, Mar 22, 2006 at 01:18:56PM -0600, Benjamin Marzinski wrote: > On Wed, Mar 22, 2006 at 02:10:30PM +0300, Denis Medvedev wrote: > > > > A better approach is to export not an GNBD but an iSCSI device from DRBD. > > > > I would definitely go with DRBD for this setup. If I understand this setup > correctly, there is a data corruption possibility. > > If you have two machines doing raid1 over a local device and a gnbd device, > you have the problem were if machine A dies after it has written to it's local > disk but not the disk on machine B. The mirror is out of sync. GNBD doesn't > do anything to help with that, and md on machine B doesn't know anything about > the state of machine A, so it can't correct the problem. So you are left with > an out of sync mirror, which is BAD. DRBD was made for exactly this setup, > and will (I believe) automagically handle this correctly. This is ignoring the obvious issue that after machine A is dead, B will presumeably keep writing to it's device, so it will obviously be out of sync. And you probably knew that. It's been a long week. But still, this sounds exactly like what DRBD was designed for. -Ben > -Ben > > > James Firth wrote: > > > > > > >Patton, Matthew F, CTR, OSD-PA&E wrote: > > > > > >>I can't think of a way to combine (C)LVM, GFS, GNBD, and MD (software > > >>RAID) and make it work unless just one of the nodes becomes the MD > > >>master and then just exports it via NFS. Can it be done? Do > > >>commercial options exist to pull off this trick? > > > > > > > > >Hi, > > > > > >We're working on the same problem. We have tried two approaches, both > > >with their own fairly serious drawbacks. > > > > > >Our goal was a 2-node all-in-one HA mega server, providing all office > > >services from one cluster, and with no single point of failure. > > > > > >The first uses a raid master for each pair. Each member of the pair > > >exports a disk using GNBD. The pair negotiate a master using CMAN, > > >and that master assembles a RAID device using one GNBD import, plus > > >one local disk, and then exports it using NFS, or in the case of GFS > > >being used, exports the assembled raid device via a third GNBD export. > > > > > >Our trick here was each node exported it's contributory disk, using > > >GNDB, by default, so long as at east one other node was active (quorum > > >> 1), knowing only one master would ever be active. This significantly > > >reduced complexity. > > > > > >Problems are: > > > - GNDB instabilities cause frequent locks and crashes, especially > > >busying DLM (suspected). > > > - NFS export scheme also causes locks and hangs to NFS clients on > > >failover *IF* a member of the pair then subsequently imports and an > > >NFS client, as needed in some of our mega-server ideas. > > > - NFS export is not too useful when file locking is important, e.g. > > >subversion, procmail etc (yes, procmail, if your mail server is also > > >your Samba homes server). You have to dell mailproc to use > > >alternative mailbox locking else mailboxes get corrupted. > > > - GFS on assembled device with GNDB export scheme works best, but > > >still causes locks and hangs. Note also an exporting client must NOT > > >import it's own exported GNBD volume, so there is no symmetry between > > >the pair, and it's quite difficult to manage. > > > > > > > > > > > >Our second approach is something we've just embarked on, and so far is > > >proving more successful, using DRBD. DRBD is used to create a > > >mirrored pair of volumes, a bit like GNBD+MD as above. > > > > > >The result is a block device accessible from both machines, but the > > >problem is that only one member of the pair is writable (master), and > > >the other is a read-only mount. > > > > > >If the master server dies, the remaining DRBD becomes the master, and > > >becomes writable. When the dead node recovers, the recovered node > > >becomes a slave, read-only. > > > > > >The problem is with the read-only aspect, so you still need to have an > > >exporting mechanism for the assembled DRBD volume running on the DRBD > > >master. We plan to do this via GNBD export (GFS FS installed). > > > > > >That's where the complexity comes in - as the DRBD negotiation appears > > >to be totally independent of cluster control suite, and so we're > > >having to use customizations to start the exporting daemon on the DRBD > > >master. > > > > > > > > >Conclusions > > >--- > > > > > >From all we've learned to date, it still seems a dedicated file server > > >or SAN approach is necessary to maintain availability. > > > > > >Either of the above schemes would work fairly well if we were just > > >building a HA storage component, because most of the complexities > > >we've encountered come about when the shared storage device is used by > > >services on the same cluster nodes. > > > > > >Most, if not all of what we've done so far is not suitable for a > > >production environment, as it just increases the coupling between > > >nodes, and therefore increases the chance of a cascade failure of the > > >cluster. In all seriousness I believe a single machine with RAID-1 > > >pair has a higher MTBF than any of our experiments. > > > > > >Many parts of the CCS/GFS suite so far released have serious issues > > >when used in non-standard configurations. For example, exception > > >handling we've encountered usually defaults to "while (1) { retry(); > > >sleep(1); }" > > > > > >I've read last year about plans for GFS mirroring from RedHat, and > > >haven't found much else since. If anyone knows more I'd love to hear. > > > > > >It also appears that the guys behind DRBD want to further develop > > >their mirroring so that both volumes can be writable, in which case > > >you can just stick GFS on the assembled device, and run whichever > > >exporting method you like as a normal cluster service. > > > > > > > > > > > >James > > > > > >www.daltonfirth.co.uk > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >-- > > > > > >Linux-cluster@xxxxxxxxxx > > >https://www.redhat.com/mailman/listinfo/linux-cluster > > > > > > > -- > > > > Linux-cluster@xxxxxxxxxx > > https://www.redhat.com/mailman/listinfo/linux-cluster > > -- > > Linux-cluster@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster