On Fri, Jan 26, 2007 at 12:20:15PM -0800, Tom Samplonius wrote: > ----- Wesley Craig <wes@xxxxxxxxx> wrote: > > Close. imapd, pop3d, lmtpd, and other processes write to the log. > > The log is read by sync_client. This merely tells sync_client what > > (probably) has changed. sync_client roll up certain log items, e.g., > > it may decide to compare a whole user's state rather than just > > looking at multiple mailboxes. Once it decides what to compare, it > > retrieves IMAP-like state information from sync_server (running on > > the replica) and pushes those changes that are necessary. > > And this exposes the big weakness with Cyrus syncing: there is only a single sync_client, and it is very easy for it get behind. Which is why we have the following: * a logwatcher on imapd.log which emails us on bailing out or other "unrecognised" log lines * the system monitoring scripts do a 'du -s' on the sync directory every 2 minutes and store the value in a database so our status commands can see if any store is behind (the trigger for noticing is 10kb, that's a couple of minutes worth of log during the U.S. day). This also emails us if it gets above 100kb (approx 20 mins behind) * a "monitorsync" process that runs from cron every 10 minutes and reads the contents of the sync directory, comparing any log-(\d+) file's PID with running processes to ensure it's actually being run and tries to sync_client -r -f the file if there isn't one. It also checks that there is a running sync_client -r process (no -f) for the store. * a weekly "checkreplication" script which logs in as each user to both the master and replica via IMAP and does a bunch of lists, examines, even fetches and compares the output to ensure they are identical. Between all that, I'm pretty comfortable that replication is correct and we'll be told if it isn't. It's certainly helped us find our share of issues with the replication system! > > For your situation, Janne, you might want to explore sharing the sync > > directory. sync_client and sync_server have interlock code, tho I > > haven't reviewed it for this specific scenario. > > Since the sync directory is specific to the master server, why would you share it? > > Unless, you want to have multiple Cyrus server all pretend to be the master, and log all of their changes to the same sync log. You would probably hit the sync_client bottleneck pretty fast this way. > > Plus, there would be a lot of contention on the sync logs if multiple servers are appending records to the same file. GFS is not fast. Yeah, that would suck. Running multiple sync_clients is going to suck too, because they'll get locked out at the replica end where only one sync_server will run at once. Messy. I still think one sync_client is the way to go if you're going to do this config at all - but the whole thing sounds way less scalable than what we're doing at FastMail with lots of small cyrus instances (multiple per physical server) and an interlaced set of replicas such that losing any one server will spread the master load out evenly over many other physical machines. But I'm not the one who is deciding which tradeoffs to use. I just know out current layout has been nice to us and will scale indefinitely. It takes a decent amount of management software to decide where to place users and make sure we don't break users with folder sharing between them and stuff though. Bron. ---- Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html