Re: Lazy Erasure Coding for RADOS

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Monday, August 6, 2012, Stephen Perkins wrote:
>
> Hi all,
>
> I would like to build a fully geo-redundant and highly available storage
> solution.  I read a research paper that describes the architecture of the
> Microsoft Azure deployment (looking to hit several hundred petabytes soon).
> This was presented at the 23rd ACM Symposium on Operating System Principles.
> Information and paper here:
>
> http://blogs.msdn.com/b/windowsazure/archive/2011/11/21/windows-azure-storag
> e-a-highly-available-cloud-storage-service-with-strong-consistency.aspx
>
> The thing I took away from it was that Microsoft considered 3 copies locally
> to be the minimum number required for protection.  However, they also
> realized that you cannot afford to scale to an Exabyte with a 3x overhead
> for storage.  So. they have a lazy process that goes around and behind the
> scenes and converts objects stored with 3X redundancy to an object that is
> erasure coded with Reed-Solomon having a 1.3 or 1.6 overhead.   At the same
> time, the RS coding provides a better long term availability than the 3x
> replication approach.
>
> Specifics of the RS coding are here (best paper award at Usenix):
>
> https://www.usenix.org/conference/usenixfederatedconferencesweek/erasure-cod
> ing-windows-azure-storage
>
> As far as I have found, there are two implementations of R-S coded object
> stores out there:
>                 Commercial - Cleversafe (http://www.cleversafe.com/)
>                 Open Source - Tahoe-LAFS (http://www.tahoe-lafs.org/)
>
> Given a certain availability metric, stronger erasure coding can make a HUGE
> difference in the cost of deployment.  See "Erasure Coding vs Replication: A
> Quantitative Comparison" here:
> http://oceanstore.cs.berkeley.edu/publications/papers/pdf/erasure_iptps.pdf
>
> Has any thought been given to implementing stronger erasure coding in RADOS
> (either directly or in a lazy fashion)?

It's been thought about in the "RADOS should support erasure codes
instead of just replication" sense, but not in the "we would do this
to implement it" sense. I don't know how Azure's storage system works
(will need to check out that paper!), but implementing erasure coding
in the OSDs would essentially require re-implementing or extending all
of their difficult code, which is obviously not something we're eager
to do at this time.
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux