Dear Bron, > http://www.newegg.com/Product/Product.aspx?Item=N82E16822148413 > > 2TB - US $109. Don't want to nit-pick here, but the effective price we pay is about ten times this. To set up a mail server with a few TB of disk space, we usually land up deploying a separate chassis with RAID controllers and a RAID array, with FC connections from servers, etc, etc. All this adds up to about $1,000/TB of usable space if you're using something like the "low-end" IBM DS3400 box or Dell/EMC equivalent. This is even with inexpensive 7200RPM SATA-II drives, not 15KRPM SAS drives. http://www-07.ibm.com/storage/in/disk/ds3000/ds3400/ And most of our customers actually double this cost because they keep two physically identical chassis for redundancy. (We recommend this too, because we can't trust a single RAID 5 array to withstand controller or PSU failures.) In that case, it's $2000/TB. And you do reach 5-10 TB of email store quite rapidly --- our company has many corporate clients (< 500 email users) whose IMAP store has reached 4TB. No one wants to enforce disk quotas (corporate policy), and most users don't want to delete emails on their own. We keep hearing the logic that storage is cheap, and stories of cloud storage through Amazon, unlimited mailboxes on Gmail, are reinforcing the belief. But at the ground level in mid-market corporate IT budgets, storage costs in data centres (as against inside desktops) are still too high to be trivial, and their prices have only little to do with the prices of raw SATA-II drives. A fully-loaded DS3400 costs a little over $12,000 in India, with a full set of 1TB SATA-II drives from IBM, but even with high cost of IBM drives, the drives themselves contribute less than 30% of the total cost. If we really want to put our collective money where our mouth is, and deliver the storage-is-cheap promise at the ground level, we need to rearchitect every file server and IMAP server to work in map-reduce mode and use disks inside desktops. Anyone game for this project? :) > Now de-duping messages on copy is valuable, not so much because of > the space it saves, but because of the IO it saves. Copying the file > around is expensive. > > De-duping componenets of messages and then reconstructing? Not so much. > You'll be causing MORE IO in general looking for the message, finding the > parts. I agree. My aim was not to reduce IOPS but to cut disk space usage. There are two areas where we are seeing a huge increase in "inactive" disk utilisation for emails. One is for the archive, which is being kept for security and compliance reasons. Every company we work with wants an archive with at least a few years' retention. They search the archive every few weeks to trace "lost" emails, not for compliance reasons but to find missing information. This means that we can't ask them to move the data out to removable storage. The second area is shared mail folders where all communication with each client/topic/project are stored practically forever. A 500-user company can easily acquire an email archive of 2-5TB. I don't care how much the IO load of that archive server increases, but I'd like to reduce disk space utilisation. If the customer can stick to 2TB of space requirements, he can use a desktop with two 2TB drives in RAID 1, and get a real cheap archive server. If this figure reaches 3-4TB, he goes into a separate RAID chassis --- the hardware cost goes up 5-10 times. These are tradeoffs a lot of small to mid-sized companies in my market fuss about. And in a more generic context, I am seeing that all kinds of intelligent de-duping of infrequently-accessed data is going to become the crying need of every mid-sized and large company. Data is growing too fast, and no one wants to impose user discipline or data cleaning. When we tell the business head "This is crazy!", he turns around and tells the CTO "But disk space is cheap! Haven't you heard of Google? What are you cribbing about? You must be doing something really inefficient here, wasting money!" thanks and regards, Shuvam ---- Cyrus Home Page: http://www.cyrusimap.org/ List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/