Re: Cyrus Murder documentation?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Nels


Quoting Nels Lindquist <nlindq@xxxxxxx>:

Hi, all.

I'm looking at moving from single server Cyrus IMAPD to a murder configuration.

I'd like to set up a multi-server configuration for high availability and load balancing, but I'd like more information about Replicated Murder (doc note: This section needs to be written..) about which the "Features" section of the docs simply states, "All backends have access to all mailboxes".


We are running a load balancing, not fully automated "high availability"
cyrus murder setup for ~10 years. Our main goal for replication was disaster
recovery and the manual failover for maintanace.

We have not fully automated the HA part because detecting,
if a server is down vs the connection is down (aka split brain) is hard,
and triggering a failover scripts by an admin was regarded sufficient
for our environment.

How does one achieve that state, and can load balancing still be done in that configuration?


The setup you are considering below is the way we did it, and is,
at the moment IMHO the only way to do it. I will describe our setup
at the end in more detail, but it is best if I explain some of the
limitations/problems first.

The documentation for all versions between 3.0.x and the latest 3.4.x seems to be incomplete, and I'm wondering where (or whether) the current documentation may be found? I've looked at everything included with the source, including the legacy 2.5 docs (which seem to be out of date) and the configuration examples, which seem to cover murder frontend/backend/mupdate servers, and replication master/replica servers but not both simultaneously.


Documentation is the weak spot in many open soucre projects.

The main limiting factors for combining murder+replication are the following

1. In the murder setup every mailbox name must be unique across
   the backends. The mupdate server manages a list of all mailboxes
   with the information on which ONE backend server the mailbox is stored.
The Mupdate / Frontend Server code must be extended to handle the location(s)
   of a replicated mailboxes

2. The replication protocol is not fully active/active (jet).
   But without full active/active replication, cyrus must only write
   to one backend, or you must be able to fix problems caused by split brain.

   The developer are working on making the protocol active/active,
   but there is AFAIK still much work to do

At the moment the replication protocol can handle the following changes on the replica:
   - new mails
   - moved mails
   - deleted mails
   But there are still problems with:
   - renaming mailboxes
   - subscribing/unsubscribing mailboxes
   I am not sure about
   - deleting/creating Mailboxes
   - changes to sieve scipts
   - quota
   - flagging mails


Otherwise I was considering four backend servers--two to be part of the murder, with each murder backend independently replicating to another standalone backend, with recovery to a "fresh" mupdate server using whichever combination of functional primary/secondary backends contain the correct superset of mailboxes.

Thoughts or pointers to additional documentation would be much appreciated.


Our Setup:

We have two locations where we host our mail server (smtp-, mx-, cyrus-,
webmail- and databses server) as virtual servers (RHEV). The storage is iSCSI based with HDDs and SSDs. We have ~47,000 Accounts, ~100,000 mailboxes, ~60 TB Mails+Replic

The connection based loadbalancing to the cyrus frontends (smtp/imap/pop/sieve) is done by two LVS/IPVS servers with vrrp failover. These Loadbalance also handle
the connections to the webmailer and smtpserver.

We have 6 cyrus servers cyrus-imapd 3.0.x), each hosting 3 instances (Frontend, Backend, Replica) These servers are paired, so that each backend is replicated to a server at the
other location.

Location A,            Location B
fe01                   fe02
ma01            ->     sl01
sl02            <-     ma02

fe03                   fe04
ma03            ->     sl03
sl04            <-     ma04

fe05                   fe06
ma05            ->     sl05
sl06            <-     ma06

In addition we have two servers running only frontend instances
The mupdate master instance is run on one of those 8 servers

Each backend (ma01 - ma06) and replication (sl01 - sl06) and the mupdate master instance has its own IP-Address. In case of planned failover we check that the replication log on the backend is empty or tiny stop the backend, wait for the replication to sync the last changes, stop the replica Switch the configuration and ip address and start the backend and replication on the other location (the server at one location is running two backends, and the paired server on the other location is running two replica). We can than stop the server with the two replica for maintenance.

In our first attempts we tried updating the mailbox location on the mupdate server instead of
swtching the ip addresses, but this was too slow.


The case of unplanned failover is similar to the planned failover.
Try to ensure that the server that is not reachable is down, stop the replica instance,
switch configuration/ip restart instance as backend.

On reboot the servers will try to determine for both instances if the backend or replica instance is in use by the paired server (arpping). And will start in the other role, or not at all. Starting as replica will also fail if a replication log is found (not completed sync/split brain) In this case an admin has to check the replication log and decide if there are critical changes on the replica (old backend) that the replication protocol can not handle that must be "manual" replicated on the new backend. In either case copy to log synclog to the new backend, and manual run sync client with the old log to
ensure all data will be in sync.

A mupdate muster instance on an failed server can be restarted on any other server. We copy the mailboxes.db from the frontend instance before the start (while the mupdate master is down, mailboxes.db can not change) and trigger a " ctl_mboxlist -m -a" on all
backend instances to ensure that the mupdate master is up to date.




   Michael





--------------------------------------------------------------------------------
Michael Menge                          Tel.: (49) 7071 / 29-70316
Universität Tübingen                   Fax.: (49) 7071 / 29-5912
Zentrum für Datenverarbeitung mail: michael.menge@xxxxxxxxxxxxxxxxxxxx
Wächterstraße 76
72074 Tübingen


------------------------------------------
Cyrus: Info
Permalink: https://cyrus.topicbox.com/groups/info/T7a361715be957813-Mf88daa8db4a1841a96dea3e9
Delivery options: https://cyrus.topicbox.com/groups/info/subscription




[Index of Archives]     [Cyrus SASL]     [Squirrel Mail]     [Asterisk PBX]     [Video For Linux]     [Photo]     [Yosemite News]     [gtk]     [KDE]     [Gimp on Windows]     [Steve's Art]

  Powered by Linux