Design/HW for cost-efficient NL archive >= 0.5PB?

Fredrik Häll <hall.fredrik@xxxxxxxxx> · Wed, 25 Dec 2013 20:47:51 +0100

I am new to Gluster, but so far it seems very attractive for my needs. I am trying to assess its suitability for a cost-efficient storage problem I am tackling. Hopefully someone can help me find how to best solve my problem. 

Capacity: 
Start with around 0.5PB usable

Redundancy: 
2 replicas with non-RAID is not sufficient. Either 3 replicas with non-raid or some combination of 2 replicas and RAID?

File types: 
Large files, around 400-1500MB each. 

Usage pattern: 
Archive (not sure if this matches nearline or not..) with files being added at around 200-300GB/day (3-400 files/day). Very few reads, order of 10 file accesses per day. Concurrent reads highly unlikely. 

The main two factors for me are cost and redundancy. Losing data is not an option, being an archive solution. Cost/usable TB is the other key factor, as we see growth estimates of 100-500TB/year.

Looking just at $/TB, a RAID-based approach to me sounds more efficient. But RAID rebuild times with large arrays of large capacity drives sound really scary. Not sure if something smart can be done since we will still have a replica left during the rebuild?

So, any suggestions on what would be possible and cost-efficient solutions? 

- Any experience on dense servers, what is advisable? 24/36/50/60 slots?
- SAS expanders/storage pods?
- RAID vs non-RAID?
- Number of replicas etc? 

Best, 

Fredrik
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users