Re: Another 2.4 upgrade horror story

Bryan Hill <bhill@xxxxxxxxxxxxxxxx> · Sun, 30 Sep 2012 09:47:09 -0700

On Sep 25, 2012, at 11:57 AM, Deniss <cyrus@xxxxxx> wrote:

On 25.09.2012 15:28, Eric Luyten wrote:
On Tue, September 25, 2012 2:01 pm, Sebastian Hagedorn wrote:
Hi,

about three weeks ago we upgraded our Cyrus installation from 2.3.x to 2.4.16.
We were aware of the reindexing issue, so we took precautionary
measures, but they didn't help a lot. We've got about 7 TB of mail data for
almost 200,000 mailboxes. We did the upgrade on a Sunday and had told our
users that mail access wouldn't be possible for the whole day. After the
actual software upgrade we ran distributed scripts that triggered the index
upgrades. We started with the largest mailboxes. The idea was that after those
that took the longest had been upgraded, the rest should be OK overnight and
early Monday. However, even though our storage infrastructure was kept at 99 %
I/O saturation, progress was much slower than anticipated.

Ultimately the server was virtually unuseable for the whole Monday and
parts of Tuesday. The last mailbox was finally upgraded on Thursday, although
on Wednesday most things were already working normally.

I realize that some of our problems were caused by infrastructure that's
not up to current standards, but nonetheless I would really urge you to never
again use an upgrade mechanism like that. Give admins a chance to upgrade
indexes in the background and over time.

+1

Sebastian,

Thank you for sharing your experiences.

As a site willing/needing to upgrade from 2.3.16 to 2.4.X this fall, we
are interested in learning about your storage backend characteristics.

What read/write IOPS rates were you registering before/during/after your
upgrade process ?

I'd understand your reluctance to share this information in a public forum.
No offence taken whatsoever !

Kind regards,
Eric Luyten, Computing Centre VUB/ULB,     Eric.Luyten@xxxxxxxxx

migration process from 2.3 to 2.4 took ~ one year for our installation. 
we converted ~200Tb of users data.
first step we did - spread data on many nodes using cyrus replication.
next we started converting nodes one by one at weekends nights to 
minimize IO load generated by users.
in fact cyrus read all data from disk to generate new indexes, so 
convert is limited by disk IO mainly while CPU is pretty cheap nowadays.
we got around 500Gb in 8 hours rate for forced reindex with 100% disk load.
we started forced reindex with most active users meanwhile allowing 
users to login and trigger reindex of their mailboxes

Sorry for hi-jacking this thread, but I'm curious as to the preferred method of forcing a reindex on a mailbox?  I know it triggers when a user logs in and accesses the mailbox.  I would like to divide up users and perform the reindex in chunks.  

Thanks,
Bryan

---
Bryan D. Hill
UCSD Physics Computing Facility
CTBP Systems Support

9500 Gilman Dr.  # 0319
La Jolla, CA 92093
+1-858-534-5538
bhill@xxxxxxxx
AIM:  pozvibesd
Web:  http://www.physics.ucsd.edu/pcf

----
Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
To Unsubscribe:
https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus