Great thread. Here as some real world numbers based on our spools here at BU. One of our masters has 4,800 users, 22,000 mailboxes, and is using about 374G of disk. Based on the md5 files for these users there are 6,046,363 messages. If I look at the first md5 value (md5 on the msg if I understand this) and sort and uniq I get 5,891,974 messages, so assuming we dedup all those messages that would be a shrink to 97.4% of the original number of messages. Assuming an even distribution of message sizes this would mean 374G would drop down to 362.78G. Unfortunately not an obvious huge win. But, I think the md5 of the message file includes headers which may be more likely to be unique over the body content. (Due to legacy support for UW IMAP, we often end up routing things differently for users on the same master so the headers for the same message sent to 2 people could be different). Isn't the easy hack for dedup just looking at the above md5 files and then doing appropriate hard links? This could be done by a nightly trawl of the spool space. A bigger win would be to separate the headers from the messages but that's a lot more work. -nik ---- Cyrus Home Page: http://www.cyrusimap.org/ List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/