Hi Nels
Quoting Nels Lindquist <nlindq@xxxxxxx>:
Hi, all.
I'm looking at moving from single server Cyrus IMAPD to a murder
configuration.
I'd like to set up a multi-server configuration for high
availability and load balancing, but I'd like more information about
Replicated Murder (doc note: This section needs to be written..)
about which the "Features" section of the docs simply states, "All
backends have access to all mailboxes".
We are running a load balancing, not fully automated "high availability"
cyrus murder setup for ~10 years. Our main goal for replication was disaster
recovery and the manual failover for maintanace.
We have not fully automated the HA part because detecting,
if a server is down vs the connection is down (aka split brain) is hard,
and triggering a failover scripts by an admin was regarded sufficient
for our environment.
How does one achieve that state, and can load balancing still be
done in that configuration?
The setup you are considering below is the way we did it, and is,
at the moment IMHO the only way to do it. I will describe our setup
at the end in more detail, but it is best if I explain some of the
limitations/problems first.
The documentation for all versions between 3.0.x and the latest
3.4.x seems to be incomplete, and I'm wondering where (or whether)
the current documentation may be found? I've looked at everything
included with the source, including the legacy 2.5 docs (which seem
to be out of date) and the configuration examples, which seem to
cover murder frontend/backend/mupdate servers, and replication
master/replica servers but not both simultaneously.
Documentation is the weak spot in many open soucre projects.
The main limiting factors for combining murder+replication are the following
1. In the murder setup every mailbox name must be unique across
the backends. The mupdate server manages a list of all mailboxes
with the information on which ONE backend server the mailbox is stored.
The Mupdate / Frontend Server code must be extended to handle the
location(s)
of a replicated mailboxes
2. The replication protocol is not fully active/active (jet).
But without full active/active replication, cyrus must only write
to one backend, or you must be able to fix problems caused by split brain.
The developer are working on making the protocol active/active,
but there is AFAIK still much work to do
At the moment the replication protocol can handle the following
changes on the replica:
- new mails
- moved mails
- deleted mails
But there are still problems with:
- renaming mailboxes
- subscribing/unsubscribing mailboxes
I am not sure about
- deleting/creating Mailboxes
- changes to sieve scipts
- quota
- flagging mails
Otherwise I was considering four backend servers--two to be part of
the murder, with each murder backend independently replicating to
another standalone backend, with recovery to a "fresh" mupdate
server using whichever combination of functional primary/secondary
backends contain the correct superset of mailboxes.
Thoughts or pointers to additional documentation would be much appreciated.
Our Setup:
We have two locations where we host our mail server (smtp-, mx-, cyrus-,
webmail- and databses server) as virtual servers (RHEV). The storage
is iSCSI based with
HDDs and SSDs. We have ~47,000 Accounts, ~100,000 mailboxes, ~60 TB
Mails+Replic
The connection based loadbalancing to the cyrus frontends
(smtp/imap/pop/sieve)
is done by two LVS/IPVS servers with vrrp failover. These Loadbalance
also handle
the connections to the webmailer and smtpserver.
We have 6 cyrus servers cyrus-imapd 3.0.x), each hosting 3 instances
(Frontend, Backend, Replica)
These servers are paired, so that each backend is replicated to a
server at the
other location.
Location A, Location B
fe01 fe02
ma01 -> sl01
sl02 <- ma02
fe03 fe04
ma03 -> sl03
sl04 <- ma04
fe05 fe06
ma05 -> sl05
sl06 <- ma06
In addition we have two servers running only frontend instances
The mupdate master instance is run on one of those 8 servers
Each backend (ma01 - ma06) and replication (sl01 - sl06) and the
mupdate master instance has
its own IP-Address. In case of planned failover we check that the
replication log on the backend
is empty or tiny stop the backend, wait for the replication to sync
the last changes, stop the replica
Switch the configuration and ip address and start the backend and
replication on the other location
(the server at one location is running two backends, and the paired
server on the other location is
running two replica). We can than stop the server with the two replica
for maintenance.
In our first attempts we tried updating the mailbox location on the
mupdate server instead of
swtching the ip addresses, but this was too slow.
The case of unplanned failover is similar to the planned failover.
Try to ensure that the server that is not reachable is down, stop the
replica instance,
switch configuration/ip restart instance as backend.
On reboot the servers will try to determine for both instances if the
backend or replica
instance is in use by the paired server (arpping). And will start in
the other role, or not
at all. Starting as replica will also fail if a replication log is
found (not completed
sync/split brain) In this case an admin has to check the replication
log and decide
if there are critical changes on the replica (old backend) that the
replication protocol
can not handle that must be "manual" replicated on the new backend. In
either case
copy to log synclog to the new backend, and manual run sync client
with the old log to
ensure all data will be in sync.
A mupdate muster instance on an failed server can be restarted on any
other server.
We copy the mailboxes.db from the frontend instance before the start
(while the mupdate
master is down, mailboxes.db can not change) and trigger a "
ctl_mboxlist -m -a" on all
backend instances to ensure that the mupdate master is up to date.
Michael
--------------------------------------------------------------------------------
Michael Menge Tel.: (49) 7071 / 29-70316
Universität Tübingen Fax.: (49) 7071 / 29-5912
Zentrum für Datenverarbeitung mail:
michael.menge@xxxxxxxxxxxxxxxxxxxx
Wächterstraße 76
72074 Tübingen
------------------------------------------
Cyrus: Info
Permalink: https://cyrus.topicbox.com/groups/info/T7a361715be957813-Mf88daa8db4a1841a96dea3e9
Delivery options: https://cyrus.topicbox.com/groups/info/subscription