Hi.
It's my first post on the list. First of all I have to say I'm new on hadoop.
We are here a small lab and we have being running cephfs for almost two years, loading it with large files (4GB to 4TB in size). Our cluster is with approximately with 400TB with ~75% of usage, and we are planning to grow a lot.
Until now, we did process most of the files the "serial reading" way. But now we will try to implement a parallel process on this files and we are looking on the hadoop plugin as a solution for using mapreduce, or something like that.
Does the hadoop plugin access cephfs over the network as a normal cluster or I can install the hadoop's processors on every ceph node and process the data locally?
Thanks and regards,
--
Aristeu
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com