Re: backing Hadoop with Ceph ??

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




Somnath - thanks for the reply ... 

:-)  Haven't tried anything yet - just starting to gather info/input/direction for this solution. 

Looking at the S3 API info [2] - there is no mention of support for the "S3a" API extensions - namely "rename" support.  The problem with backing via S3 API - if you need to rename a large (say multi GB) data object - you have to copy to new name and delete - this is a very IO expensive operation - and something we do a lot of.  That in and of itself might be a deal breaker ...   Any idea/input/intention of supporting the S3a exentsions within the RadosGW S3 API implementation? 

Plus - it seems like it's considered a "bad idea" to back Hadoop via S3 (and indirectly Ceph via RGW) [3]; though not sure if the architectural differences from Amazon's S3 implementation and the far superior Ceph make it more palatable? 

~~shane 




On 7/15/15, 9:50 AM, "Somnath Roy" <Somnath.Roy@xxxxxxxxxxx> wrote:

Did you try to integrate ceph +rgw+s3 with Hadoop?

Sent from my iPhone

On Jul 15, 2015, at 8:58 AM, Shane Gibson <Shane_Gibson@xxxxxxxxxxxx> wrote:



We are in the (very) early stages of considering testing backing Hadoop via Ceph - as opposed to HDFS.  I've seen a few very vague references to doing that, but haven't found any concrete info (architecture, configuration recommendations, gotchas, lessons learned, etc...).   I did find the ceph.com/docs/ info [1] which discusses use of CephFS for backing Hadoop - but this would be foolish for production clusters given that CephFS isn't yet considered production quality/grade.  

Does anyone in the ceph-users community have experience with this that they'd be willing to share?   Preferably ... via use of Ceph - not via CephFS...but I am interested in any CephFS related experiences too.

If we were to do this, and Ceph proved out as a backing store to Hadoop - there is the potential to be creating a fairly large multi-Petabyte (100s ??) class backing store for Ceph.  We do a very large amount of analytics on a lot of data sets for security trending correlations, etc... 

Our current Ceph experience is limited to a few small (90 x 4TB OSD size) clusters - which we are working towards putting in production for Glance/Cinder backing and for Block storage for various large storage need platforms (eg software and package repo/mirrors, etc...). 

Thanks in  advance for any input, thoughts, or pointers ... 

~~shane 



_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux