> I need a sanity check here. > > I had a single storage partition that I've grown to ~400GB to house > about 270,000 mailboxes. I managed to reduce that number by around > 115,000 by purging out accounts no one's logged into, still leaving > quite a mess of accounts. I've been migrating them with some perl and > the mailbox move stuff in cyradm to four new partitions of 75GB each and > I'm finding that I'm very quickly running out of space due to the > breaking up of the singleinstancestore storage gains. > > To remedy this, I'm thinking about traversing the mailboxes on each > partition building a database of checksums to identify identical > messages, then replacing the duplicated content with hard links and > reconstructing the user's mailbox (for good measure, although it > shouldn't be necessary). > > I imagine the storage savings with this plan would be huge, but it > screams danger and I'm wondering if I should bother. However, by my > calculations, I'll need another 600GB (for a total of 900GB) instead of > the current 400GB (which is actually only at 41% right now) and I simply > don't have the space. > > The code to do this seems pretty trivial, but has anyone had to do this > before/is there a tool out there already that does it? I did some tests long time ago and IIRC it was without any problems. Possible tools are here: ftp://ftp.redhat.com/pub/redhat/mirror-tools/ http://code.google.com/p/hardlinkpy/ Simon ---- Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html