Re: Implement Cyrus IMAPD in High Load Enviromment

Brian Awood <bawood@xxxxxxxxx> · Wed, 30 Sep 2009 01:01:28 -0400

On Tuesday 29 September 2009 @ 18:41, Bron Gondwana wrote:
>
> Possibly the secret is that we use IPAddr2 from linux-ha to force
> ARP flushes, and we transfer the primary IP address between
> machines, so nothing else needs to know - we just shut down one end
> and bring up the other with the IP and it's all good.

Our primaries and replicas are located in different data centers, and 
since we have not control over how the network is subdivided it's 
impossible for them to take the same IPs.  

>
> Our process is:
>
> a) check there are less than 10kb of files in $conf/sync/ - else
> abort b) shut down the master (host A)
> c) run sync_client -f $file on each file in $conf/sync (if any)
> c2) (if any sync fails, restart the master (host A))
> d) shut down the replica (host B)
> e) update the database with the new master location
> f) start up the replica (host A)
> g) start up the master (host B)
>
> This means replication starts immediately, because the replica is
> already there when the master starts.

So you just immediately start replicating back to a host (or site) 
that just failed?  How does that work?

We have a third level of machines that we sync to, in an out of band 
process, but the data is stored exactly the same way so we can start 
replicating to them immediately.  So even if a entire data center 
failed, we can still be running a fully replicated service with 
almost no downtime visible to users.

Brian
----
Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html