Re: replication

Michael Menge <michael.menge@xxxxxxxxxxxxxxxxxxxx> · Tue, 16 Nov 2010 11:47:02 +0100

Quoting Shuvam Misra <shuvam.misra@xxxxxxxxxxxxxx>:

Quoting Bron Gondwana <brong@xxxxxxxxxxx>:
>
> It's getting better, but it's still not 100% reliable to have
> master/master replication between two servers with interactions
> going to both sides.
>
> It SHOULD be safe now to have a single master/master setup with
> individual users on one side or the other - but note that nobody
> is known to be running that setup successfully yet.
>
> As for what the point is?  I don't know about you, but I run a
> 24hr/day shop, and I like to be able to take a server down for
> maintainence in about 2 minutes, with users seeing a brief
> disconnection and then being able to keep using the service
> with minimal disruption.
>
> Bron.

As Bron already mentioned the problems of master/master mode
you can easy live without.

We run multiple servers, these are paired, each server is running one
cyrus instance in as master and one as slave, so that the pairs
replicate each other. In case of a crash one server would run two
master instances.

You only need a way of splitting the users between the  servers.
That could be DNS, a proxy or murder setup.

Are you using local storage on each server for spool and metadata?

We have all cyrus storage on iSCSI-Systems

How good/bad is the idea of using shared storage (an external SAN
chassis) and letting multiple servers keep their spool areas there? Can
one set up, say, half a dozen servers in a pool, each using a separate
LUN for spool+data on a common back-end SAN chassis? Out of the six
servers, one would be a hot spare, standing by. If any of the five active
servers failed, the standby would be told to mount the failed server's
LUN, borrow the failed server's IP address, and start offering services?

That would work, but you would still have a single point of failure
if the SAN system chraches or if the filesystem of one backend gets
corrupted.

We have 6 Servers and 2 independent iSCSI-Systems. Each iSCSI-System
holds 3 partitions for active servers and 3 partiotins for replications.

In this proposed model, each user's account is on one "physical" server
(i.e. bound to a specific IP address). No load balancing or connection
spreading is needed when clients connect. If the site chooses to use
Murder, then the proposed model can apply to the back-end while the
multiplexer can take care of the front-end.

The only thing I'm not sure about is the file system corruption when a
node goes down and the time taken for an fsck before the standby node can
assume the failed node's role. I wonder whether something like the ext4
will help reduce fsck timings to acceptable levels.

The time checking is one thing, but if you lose data in one partition
you have a problem. Restoring files from filebased backup is a pain
if you have many small files like cyrus has.

Is this a good idea for a scalable fault-tolerant Cyrus setup? I've been
toying with this approach for some time, for a proposed large-system design.

We are testing cyrus murder to ease the work of switching to a replication
and back.

--------------------------------------------------------------------------------
M.Menge                                Tel.: (49) 7071/29-70316
UniversitÃt TÃbingen                   Fax.: (49) 7071/29-5912
Zentrum fÃr Datenverarbeitung          mail:  
michael.menge@xxxxxxxxxxxxxxxxxxxx
WÃchterstraÃe 76
72074 TÃbingen
Attachment:
smime.p7s

Description: S/MIME Signatur
----
Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/