Supposedly, on 2013-Apr-18, at 14.08 PDT(-0700), someone claiming to be Josh Durgin scribed: > On 04/18/2013 01:47 PM, Sage Weil wrote: >> On Thu, 18 Apr 2013, Plaetinck, Dieter wrote: >>> sorry to bring this up again, googling revealed some people don't like the subject [anymore]. >>> >>> but I'm working on a new +- 3PB cluster for storage of immutable files. >>> and it would be either all cold data, or mostly cold. 150MB avg filesize, max size 5GB (for now) >>> For this use case, my impression is erasure coding would make a lot of sense >>> (though I'm not sure about the computational overhead on storing and loading objects..? outbound traffic would peak at 6 Gbps, but I can make it way less and still keep a large cluster, by taking away the small set of hot files. >>> inbound traffic would be minimal) >>> >>> I know that the answer a while ago was "no plans to implement erasure coding", has this changed? >>> if not, is anyone aware of a similar system that does support it? I found QFS but that's meant for batch processing, has a single 'namenode' etc. >> >> We would love to do it, but it is not a priority at the moment (things >> like multi-site replication are in much higher demand). That of course >> doesn't prevent someone outside of Inktank from working on it :) >> >> The main caveat is that it will be complicate. For an initial >> implementation, the full breadth of the rados API probably wouldn't be >> support for erasure/parity encoded pools (thinkgs like rados classes and >> the omap key/value api get tricky when you start talking about parity). >> But for many (or even most) use cases, objects are just bytes, and those >> restrictions are just fine. > > I talked to some folks interested in doing a more limited form of this > yesterday. They started a blueprint [1]. One of their ideas was to have > erasure coding done by a separate process (or thread perhaps). It would > use erasure coding on an object and then use librados to store the > rasure-encoded pieces in a separate pool, and finally leave a marker in > place of the original object in the first pool. > > When the osd detected this marker, it would proxy the request to the > erasure coding thread/process which would service the request on the > second pool for reads, and potentially make writes move the data back to > the first pool in a tiering sort of scenario. > > I might have misremembered some details, but I think it's an > interesting way to get many of the benefits of erasure coding with a relatively small amount of work compared to a fully native osd solution. Greetings, I'm one of those individuals :) Our thinking is evolving on this, and I think we can keep most of the work out of the main machinery of ceph, and simply require a modified client that runs the "proxy" function on the "hot" pool OSDs. Even wondering if it could be prototyped in fuse. I will be writing this up in the next day or two in the blueprint below. Josh has the idea basically correct. > > Josh Christopher > > [1] http://wiki.ceph.com/01Planning/02Blueprints/Dumpling/Erasure_encoding_as_a_storage_backend -- 李柯睿 Check my PGP key here: https://www.asgaard.org/~cdl/cdl.asc Current vCard here: https://www.asgaard.org/~cdl/cdl.vcf Check my calendar availability: https://tungle.me/cdl
Attachment:
signature.asc
Description: OpenPGP digital signature