On Friday, May 30, 2014, Ignazio Cassano <ignaziocassano at gmail.com> wrote: > Hi all, > I am testing ceph because I found it is very interesting as far as remote > block > device is concerned. > But my company is very interested in big data. > So I read something about hadoop and ceph integration. > Anyone can suggest me some documentation explaining the purpose of > ceph/hadoop integration ? > Why don't use only hadoop for big data ? > It has a couple of advantages now: 1) if you're already running Ceph, you only need to manage one storage cluster 2) you get all of Ceph's reliability, resiliency, and dynamism 3) you get a real posix filesystem that you can run Hadoop workloads against (which enables things like using other data Analytics systems against it) In the future, when CephFS is more fully supported for production use, you'll also be able to do things like use Ceph as the canonical location of all your data, and run Hadoop loads against it without having to so an export/import, etc. -Greg -- Software Engineer #42 @ http://inktank.com | http://ceph.com -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140530/0becffa2/attachment.htm>