Re: backing Hadoop with Ceph ??

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 07/15/2015 11:48 AM, Shane Gibson wrote:

Somnath - thanks for the reply ...

:-)  Haven't tried anything yet - just starting to gather
info/input/direction for this solution.

Looking at the S3 API info [2] - there is no mention of support for the
"S3a" API extensions - namely "rename" support.  The problem with
backing via S3 API - if you need to rename a large (say multi GB) data
object - you have to copy to new name and delete - this is a very IO
expensive operation - and something we do a lot of.  That in and of
itself might be a deal breaker ...   Any idea/input/intention of
supporting the S3a exentsions within the RadosGW S3 API implementation?

I see you're trying out cephfs now, and I think that makes sense.

I just wanted to mention that at CDS a couple weeks ago Yehuda noted
that RGW's rename is cheap, since it does not require copying the data,
just updating its location [1].

Josh

[1] http://pad.ceph.com/p/hadoop-over-rgw

Plus - it seems like it's considered a "bad idea" to back Hadoop via S3
(and indirectly Ceph via RGW) [3]; though not sure if the architectural
differences from Amazon's S3 implementation and the far superior Ceph
make it more palatable?

~~shane

[2] http://ceph.com/docs/master/radosgw/s3/
[3] https://wiki.apache.org/hadoop/AmazonS3




On 7/15/15, 9:50 AM, "Somnath Roy" <Somnath.Roy@xxxxxxxxxxx
<mailto:Somnath.Roy@xxxxxxxxxxx>> wrote:

    Did you try to integrate ceph +rgw+s3 with Hadoop?

    Sent from my iPhone

    On Jul 15, 2015, at 8:58 AM, Shane Gibson <Shane_Gibson@xxxxxxxxxxxx
    <mailto:Shane_Gibson@xxxxxxxxxxxx>> wrote:



    We are in the (very) early stages of considering testing backing
    Hadoop via Ceph - as opposed to HDFS.  I've seen a few very vague
    references to doing that, but haven't found any concrete info
    (architecture, configuration recommendations, gotchas, lessons
    learned, etc...).   I did find the ceph.com/docs/
    <http://ceph.com/docs/> info [1] which discusses use of CephFS for
    backing Hadoop - but this would be foolish for production clusters
    given that CephFS isn't yet considered production quality/grade.

    Does anyone in the ceph-users community have experience with this
    that they'd be willing to share?   Preferably ... via use of Ceph
    - not via CephFS...but I am interested in any CephFS related
    experiences too.

    If we were to do this, and Ceph proved out as a backing store to
    Hadoop - there is the potential to be creating a fairly large
    multi-Petabyte (100s ??) class backing store for Ceph.  We do a
    very large amount of analytics on a lot of data sets for security
    trending correlations, etc...

    Our current Ceph experience is limited to a few small (90 x 4TB
    OSD size) clusters - which we are working towards putting in
    production for Glance/Cinder backing and for Block storage for
    various large storage need platforms (eg software and package
    repo/mirrors, etc...).

    Thanks in  advance for any input, thoughts, or pointers ...

    ~~shane

    [1] http://ceph.com/docs/master/cephfs/hadoop/



    _______________________________________________
    ceph-users mailing list
    ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
    http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

    ------------------------------------------------------------------------

    PLEASE NOTE: The information contained in this electronic mail
    message is intended only for the use of the designated recipient(s)
    named above. If the reader of this message is not the intended
    recipient, you are hereby notified that you have received this
    message in error and that any review, dissemination, distribution,
    or copying of this message is strictly prohibited. If you have
    received this communication in error, please notify the sender by
    telephone or e-mail (as shown above) immediately and destroy any and
    all copies of this message in your possession (whether hard copies
    or electronically stored copies).



_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux