Re: Cephfs Hadoop Plugin and CEPH integration

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Nov 29, 2017 at 6:54 PM, Gregory Farnum <gfarnum@xxxxxxxxxx> wrote:
> On Wed, Nov 29, 2017 at 8:52 AM Aristeu Gil Alves Jr <aristeu.jr@xxxxxxxxx>
> wrote:
>>>
>>> > Does s3 or swifta (for hadoop or spark) have integrated data-layout
>>> > APIs for
>>> > local processing data as have cephfs hadoop plugin?
>>> >
>>> With s3 and swift you won't have data locality as it was designed for
>>> public cloud.
>>> We recommend disable locality based scheduling in Hadoop when running
>>> with those connectors.
>>> There is on going work on to optimize those connectors to work with
>>> object storage.
>>> Hadoop community works on the s3a connector.
>>> There is also https://github.com/SparkTC/stocator which is a swift
>>> based connector IBM wrote  for their cloud.
>>
>>
>>
>> Assuming this cases, how would be a mapreduce process without data
>> locality?
>> How the processors get the data? Still there's the need to split the data,
>> no?
>> Doesn't it severely impact the performance of big files (not just the
>> network)?
>>
>
> Given that you already have your data in CephFS (and have been using it
> successfully for two years!), I'd try using its Hadoop plugin and seeing if
> it suits your needs. Trying a less-supported plugin is a lot easier than
> rolling out a new storage stack! :)

completely agree :)

> -Greg
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux