Lazy Erasure Coding for RADOS

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all,

I would like to build a fully geo-redundant and highly available storage
solution.  I read a research paper that describes the architecture of the
Microsoft Azure deployment (looking to hit several hundred petabytes soon).
This was presented at the 23rd ACM Symposium on Operating System Principles.
Information and paper here:  

http://blogs.msdn.com/b/windowsazure/archive/2011/11/21/windows-azure-storag
e-a-highly-available-cloud-storage-service-with-strong-consistency.aspx

The thing I took away from it was that Microsoft considered 3 copies locally
to be the minimum number required for protection.  However, they also
realized that you cannot afford to scale to an Exabyte with a 3x overhead
for storage.  So. they have a lazy process that goes around and behind the
scenes and converts objects stored with 3X redundancy to an object that is
erasure coded with Reed-Solomon having a 1.3 or 1.6 overhead.   At the same
time, the RS coding provides a better long term availability than the 3x
replication approach.

Specifics of the RS coding are here (best paper award at Usenix):  

https://www.usenix.org/conference/usenixfederatedconferencesweek/erasure-cod
ing-windows-azure-storage

As far as I have found, there are two implementations of R-S coded object
stores out there:
                Commercial - Cleversafe (http://www.cleversafe.com/)
                Open Source - Tahoe-LAFS (http://www.tahoe-lafs.org/)

Given a certain availability metric, stronger erasure coding can make a HUGE
difference in the cost of deployment.  See "Erasure Coding vs Replication: A
Quantitative Comparison" here:
http://oceanstore.cs.berkeley.edu/publications/papers/pdf/erasure_iptps.pdf

Has any thought been given to implementing stronger erasure coding in RADOS
(either directly or in a lazy fashion)?

Thanks in advance for any thoughts,

- Steve

---
Stephen Perkins
NetMass Incorporated
800-731-2737 x5005
+1-972-838-1520 x5005
perkins@xxxxxxxxxxx
 
NetMassT
The safe data company.


--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux