On 15/07/15 16:57, Shane Gibson wrote:
We are in the (very) early stages of considering testing backing Hadoop via Ceph - as opposed to HDFS. I've seen a few very vague references to doing that, but haven't found any concrete info (architecture, configuration recommendations, gotchas, lessons learned, etc...). I did find the ceph.com/docs/ info [1] which discusses use of CephFS for backing Hadoop - but this would be foolish for production clusters given that CephFS isn't yet considered production quality/grade.
For analytics workloads where you're handling ephemeral datasets or scratch data, you might find that self-supporting a cephfs instance is a workable solution. The in-development fsck parts of cephfs are usually more of a concern for long term storage use cases, and for providing fully vendor-supported systems. I'd encourage you to try out the hadoop+cephfs setup and let us know what kind of issues you hit, if any.
Cheers, John _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com