Re: Cephfs Hadoop Plugin and CEPH integration

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> Does s3 or swifta (for hadoop or spark) have integrated data-layout APIs for
> local processing data as have cephfs hadoop plugin?
>
With s3 and swift you won't have data locality as it was designed for
public cloud.
We recommend disable locality based scheduling in Hadoop when running
with those connectors.
There is on going work on to optimize those connectors to work with
object storage.
Hadoop community works on the s3a connector.
There is also https://github.com/SparkTC/stocator which is a swift
based connector IBM wrote  for their cloud.


Assuming this cases, how would be a mapreduce process without data locality? 
How the processors get the data? Still there's the need to split the data, no?
Doesn't it severely impact the performance of big files (not just the network)?

--
Aristeu
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux