On Fri, 23 Oct 2009, Bron Gondwana wrote: > I've seen heartbeat get split brain before. We gave up on it. We do > all our fencing via humans now! Check the KVM, kick the box, manually > run the failover script. Some of my colleagues have had a lot of grief with Heartbeat going split brain. It seems to really be designed for a pair of machines sitting next to each other in a rack with a serial link for the heartbeat, rather servers installed in a pair of machine rooms three miles apart. We do manual failover with our Cyrus mailstores: I would rather 1/8th of my users had an outage of a couple of hours (and typically just a few minutes) than end up with a split brain. On the one occasion in five years that we did end up with a Cyrus split brain (replication failed because of a memory DIMM error and then the entire master failed a few minutes later) it was easy enough to fish missing messages out of the dead system the following day and reinject them using LMTP. Certainly easier than reengineering the entire Cyrus mailstore to allow active/active replication. On Wed, Oct 21, 2009 at 08:45:11PM +0200, David Touzeau wrote: > I would like to know if it is possible to SET the replica has the master > too in order to replicate new mail saved on the replica to the master > and vis versa In this case it should be turn to active/active.. We do this to a limited degree: the set of active users on a pair of mailstores can be partitioned and bounced back and forth between the two servers in a pair. This is mostly useful for load balancing between our two machine rooms, or migrating all the users off a master so that we can patch and reboot without any user visible downtime. However this is using my own replication code rather than the branch which was rewritten into Cyrus by Ken. I have additional safeguards to stop sync_client from overwriting the master data in a pair (which has only ever happened because of stupidity on my part when testing). I've never used the standard replication code in Cyrus other than to backport (sideport?) additional features such as CONDSTORE and GUID support. Given the grief Fastmail had with the early Cyrus replication code I think that I'm rather glad about this. Every once in a while I think about moving to standard Cyrus replication. Unfortunately there are a lot of warts that I really don't like. It is much easier to just drop my own replication code onto new versions of Cyrus (typically < 5 minutes work each time). That was one of my original design objectives. -- David Carter Email: David.Carter@xxxxxxxxxxxxx University Computing Service, Phone: (01223) 334502 New Museums Site, Pembroke Street, Fax: (01223) 334679 Cambridge UK. CB2 3QH. ---- Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html