On Thu, 18 Apr 2013 16:09:52 -0500 Mark Nelson <mark.nelson@xxxxxxxxxxx> wrote: > On 04/18/2013 04:08 PM, Josh Durgin wrote: > > On 04/18/2013 01:47 PM, Sage Weil wrote: > >> On Thu, 18 Apr 2013, Plaetinck, Dieter wrote: > >>> sorry to bring this up again, googling revealed some people don't > >>> like the subject [anymore]. > >>> > >>> but I'm working on a new +- 3PB cluster for storage of immutable files. > >>> and it would be either all cold data, or mostly cold. 150MB avg > >>> filesize, max size 5GB (for now) > >>> For this use case, my impression is erasure coding would make a lot > >>> of sense > >>> (though I'm not sure about the computational overhead on storing and > >>> loading objects..? outbound traffic would peak at 6 Gbps, but I can > >>> make it way less and still keep a large cluster, by taking away the > >>> small set of hot files. > >>> inbound traffic would be minimal) > >>> > >>> I know that the answer a while ago was "no plans to implement erasure > >>> coding", has this changed? > >>> if not, is anyone aware of a similar system that does support it? I > >>> found QFS but that's meant for batch processing, has a single > >>> 'namenode' etc. > >> > >> We would love to do it, but it is not a priority at the moment (things > >> like multi-site replication are in much higher demand). That of course > >> doesn't prevent someone outside of Inktank from working on it :) > >> > >> The main caveat is that it will be complicate. For an initial > >> implementation, the full breadth of the rados API probably wouldn't be > >> support for erasure/parity encoded pools (thinkgs like rados classes and > >> the omap key/value api get tricky when you start talking about parity). > >> But for many (or even most) use cases, objects are just bytes, and those > >> restrictions are just fine. > > > > I talked to some folks interested in doing a more limited form of this > > yesterday. They started a blueprint [1]. One of their ideas was to have > > erasure coding done by a separate process (or thread perhaps). It would > > use erasure coding on an object and then use librados to store the > > rasure-encoded pieces in a separate pool, and finally leave a marker in > > place of the original object in the first pool. > > > > When the osd detected this marker, it would proxy the request to the > > erasure coding thread/process which would service the request on the > > second pool for reads, and potentially make writes move the data back to > > the first pool in a tiering sort of scenario. > > > > I might have misremembered some details, but I think it's an > > interesting way to get many of the benefits of erasure coding with a > > relatively small amount of work compared to a fully native osd solution. > > > > Josh > > Neat. :) > @Bryan: I did come across cleversafe. all the articles around it seemed promising, but unfortunately it seems everything related to the cleversafe open source project somehow vanished from the internet. (e.g. http://www.cleversafe.org/) quite weird... @Sage: interesting. I thought it would be more relatively simple if one assumes the restriction of immutable files. I'm not familiar with those ceph specifics you're mentioning. When building an erasure codes-based system, maybe there's ways to reuse existing ceph code and/or allow some integration with replication based objects, without aiming for full integration or full support of the rados api, based on some tradeoffs. @Josh, that sounds like an interesting approach. Too bad that page doesn't contain any information yet :) Dieter -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html