On Wed, Jul 15, 2015 at 10:50 PM, John Spray <john.spray@xxxxxxxxxx> wrote: > > > On 15/07/15 16:57, Shane Gibson wrote: >> >> >> >> We are in the (very) early stages of considering testing backing Hadoop >> via Ceph - as opposed to HDFS. I've seen a few very vague references to >> doing that, but haven't found any concrete info (architecture, configuration >> recommendations, gotchas, lessons learned, etc...). I did find the >> ceph.com/docs/ info [1] which discusses use of CephFS for backing Hadoop - >> but this would be foolish for production clusters given that CephFS isn't >> yet considered production quality/grade. > > > For analytics workloads where you're handling ephemeral datasets or scratch > data, you might find that self-supporting a cephfs instance is a workable > solution. The in-development fsck parts of cephfs are usually more of a > concern for long term storage use cases, and for providing fully > vendor-supported systems. I'd encourage you to try out the hadoop+cephfs > setup and let us know what kind of issues you hit, if any. Yep! The Hadoop workload is a fairly simple one that is unlikely to break anything in CephFS. We run a limited set of Hadoop tests on it every week and provide bindings to set it up; I think the documentation is a bit lacking here but if you've ever used a third-party FS with Hadoop I don't think it should be too challenging. I'm hoping we get better documentation written up soonish. -Greg _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com