> Quoting Bron Gondwana <brong@xxxxxxxxxxx>: > > > > It's getting better, but it's still not 100% reliable to have > > master/master replication between two servers with interactions > > going to both sides. > > > > It SHOULD be safe now to have a single master/master setup with > > individual users on one side or the other - but note that nobody > > is known to be running that setup successfully yet. > > > > As for what the point is? I don't know about you, but I run a > > 24hr/day shop, and I like to be able to take a server down for > > maintainence in about 2 minutes, with users seeing a brief > > disconnection and then being able to keep using the service > > with minimal disruption. > > > > Bron. > > As Bron already mentioned the problems of master/master mode > you can easy live without. > > We run multiple servers, these are paired, each server is running one > cyrus instance in as master and one as slave, so that the pairs > replicate each other. In case of a crash one server would run two > master instances. > > You only need a way of splitting the users between the servers. > That could be DNS, a proxy or murder setup. Are you using local storage on each server for spool and metadata? How good/bad is the idea of using shared storage (an external SAN chassis) and letting multiple servers keep their spool areas there? Can one set up, say, half a dozen servers in a pool, each using a separate LUN for spool+data on a common back-end SAN chassis? Out of the six servers, one would be a hot spare, standing by. If any of the five active servers failed, the standby would be told to mount the failed server's LUN, borrow the failed server's IP address, and start offering services? In this proposed model, each user's account is on one "physical" server (i.e. bound to a specific IP address). No load balancing or connection spreading is needed when clients connect. If the site chooses to use Murder, then the proposed model can apply to the back-end while the multiplexer can take care of the front-end. The only thing I'm not sure about is the file system corruption when a node goes down and the time taken for an fsck before the standby node can assume the failed node's role. I wonder whether something like the ext4 will help reduce fsck timings to acceptable levels. Is this a good idea for a scalable fault-tolerant Cyrus setup? I've been toying with this approach for some time, for a proposed large-system design. Shuvam ---- Cyrus Home Page: http://www.cyrusimap.org/ List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/