Re: UC Davis Cyrus Incident September 2007

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 16 Oct 2007, Vincent Fox wrote:

> So here's the story of the UC Davis (no, not Berkeley) Cyrus conversion.....

[snip]

> 5th STEP: Cyrus migration
> ====================
>
> The politics of educational environment is that you MUST do massive
> changeouts like this during summer quarter.  So the last couple of months
> of summer we were busily migrating all the UWash users to Cyrus.
> About 29K users to ms1, and 23K users to ms2.  Everything worked great.
> Typically about 500 Cyrus processes running.
>
> 6th STEP:  The excrement hits the rotating blades
> ===================================
>
> About a week before classes actually start is when all the kids start moving
> back into town and mailing all their buds.  We saw process numbers go
> from 500-ish to as high as 5,000.  Load would climb radically after passing
> 2,000 processes and systems became slow to respond.  This persisted for
> 4 days with us on the phone with Ken & Jeff and anyone else who would
> talk to us, trying to find the right tweaks on the Cyrus software.  We tried
> moving to quota-legacy and using BDB for delivery database a few other
> tweaks suggested, but none brought us substantial relief.

I feel your pain.  The first week of fall term is always the time when we 
see how well we did our planning and testing.  :)

Luckily, we haven't had problems like this with Cyrus, but there are 
several software upgrades that have bit us in the ass in the same way. 
For example, we upgraded to Horde3 this summer.  Everything was humming 
along nicely (increased load averages, but still snappy) until the first 
day of fall term.  Then the MySQL server load average climbed to 100 and 
Horde slowed to a crawl.  It took about 4 days to figure out that a 
particularly obnoxious SQL query was the problem and needed an additional 
index (in hindsight, it was pretty obvious).

So, some years we get things right and fall term runs smoothly.  Some 
years things go badly.  :)

Whenever possible, I really prefer to make gradual changes and slowly 
ramp-up into production numbers.

This is a fascinating story, so please keep us all posted with your 
findings!

 	Andy
----
Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html

[Index of Archives]     [Cyrus SASL]     [Squirrel Mail]     [Asterisk PBX]     [Video For Linux]     [Photo]     [Yosemite News]     [gtk]     [KDE]     [Gimp on Windows]     [Steve's Art]

  Powered by Linux