Re: Hadoop / Ceph and Data locality ?

Noah Watkins <noah.watkins@xxxxxxxxxxx> · Mon, 8 Jul 2013 13:36:20 -0700

Yes, all of the code needed to get the locality information should be
present the version of the jar file you referenced. We have tested a
to make sure the right data is available, but have not extensively
tested that it is being used correctly by core Hadoop (e.g. that is
being correctly propagated out of CephFileSystem). IIRC fixing this
/should/ be pretty easy; fiddling with getFileBlockLocation.

On Mon, Jul 8, 2013 at 1:25 PM, ker can <kercan74@xxxxxxxxx> wrote:
> Hi Noah,
>
> I'm using the CephFS jar from ...
> http://ceph.com/download/hadoop-cephfs.jar
> I beleive this is built from hadoop-common/cephfs/branch-1.0 ?
>
> If thats the case, I should already be using an implementation thats got
> getFileBlockLocations() ... which is here
> https://github.com/ceph/hadoop-common/blob/cephfs/branch-1.0/src/core/org/apache/hadoop/fs/ceph/CephFileSystem.java
>
> Is there a command line tool that I can use to verify the results from
> getFileBlockLocations() ?
>
> thanks
> KC
>
>
>
> On Mon, Jul 8, 2013 at 3:09 PM, Noah Watkins <noah.watkins@xxxxxxxxxxx>
> wrote:
>>
>> Hi KC,
>>
>> The locality information is now collected and available to Hadoop
>> through the CephFS API, so fixing this is certainly possible. However,
>> there has not been extensive testing. I think the tasks that need to
>> be completed are (1) make sure that `CephFileSystem` is encoding the
>> correct block location in `getFileBlockLocations` (which I think it is
>> currently completed, but does need to be verified), and (2) make sure
>> rack information is available in the jobtracker, or optionally use a
>> flat hierarchy (i.e. default-rack).
>>
>> On Mon, Jul 8, 2013 at 12:47 PM, ker can <kercan74@xxxxxxxxx> wrote:
>> > Hi There,
>> >
>> > I'm test driving Hadoop with CephFS as the storage layer. I was running
>> > the
>> > Terasort benchmark and  I noticed a lot of network IO activity when
>> > compared
>> > to a HDFS storage layer setup. (Its a half-a-terabyte sort workload over
>> > two
>> > data nodes.)
>> >
>> > Digging into the job tracker logs a little, I noticed that all the map
>> > tasks
>> > were being assigned to process a split (block)  on non-local nodes
>> > (which
>> > explains all the network activity during the map phase)
>> >
>> > With Ceph:
>> >
>> >
>> > 2013-07-08 11:19:53,535 INFO org.apache.hadoop.mapred.JobInProgress:
>> > Input
>> > size for job job_201307081115_0001 = 500000000000. Number of splits =
>> > 7452
>> > 2013-07-08 11:19:53,538 INFO org.apache.hadoop.mapred.JobInProgress: Job
>> > job_201307081115_0001 initialized successfully with 7452 map tasks and
>> > 32
>> > reduce tasks.
>> >
>> > 2013-07-08 11:19:54,836 INFO org.apache.hadoop.mapred.JobInProgress:
>> > Choosing a non-local task task_201307081115_0001_m_000000
>> > 2013-07-08 11:19:54,836 INFO org.apache.hadoop.mapred.JobTracker: Adding
>> > task (MAP) 'attempt_201307081115_0001_m_000000_0' to tip
>> > task_201307081115_0001_m_000000, for tracker
>> > 'tracker_vega7250:localhost/127.0.0.1:35422'
>> >
>> > 2013-07-08 11:19:54,990 INFO org.apache.hadoop.mapred.JobInProgress:
>> > Choosing a non-local task task_201307081115_0001_m_000001
>> > 2013-07-08 11:19:54,990 INFO org.apache.hadoop.mapred.JobTracker: Adding
>> > task (MAP) 'attempt_201307081115_0001_m_000001_0' to tip
>> > task_201307081115_0001_m_000001, for tracker
>> > 'tracker_vega7249:localhost/127.0.0.1:36725'
>> >
>> > ... and so on.
>> >
>> > In comparison with HDFS, the job tracker logs looked something like
>> > this.
>> > The maps tasks were being assigned to process data blocks on the local
>> > nodes.
>> >
>> > 2013-07-08 03:55:32,656 INFO org.apache.hadoop.mapred.JobInProgress:
>> > Input
>> > size for job job_201307080351_0001 = 500000000000. Number of splits =
>> > 7452
>> > 2013-07-08 03:55:32,657 INFO org.apache.hadoop.mapred.JobInProgress:
>> > tip:task_201307080351_0001_m_000000 has split on
>> > node:/default-rack/vega7247
>> > 2013-07-08 03:55:32,657 INFO org.apache.hadoop.mapred.JobInProgress:
>> > tip:task_201307080351_0001_m_000001 has split on
>> > node:/default-rack/vega7247
>> > 2013-07-08 03:55:34,474 INFO org.apache.hadoop.mapred.JobTracker: Adding
>> > task (MAP) 'attempt_201307080351_0001_m_000000_0' to tip
>> > task_201307080351_0001_m_000000, for tracker
>> > 'tracker_vega7247:localhost/127.0.0.1:43320'
>> > 2013-07-08 03:55:34,475 INFO org.apache.hadoop.mapred.JobInProgress:
>> > Choosing data-local task task_201307080351_0001_m_000000
>> > 2013-07-08 03:55:34,475 INFO org.apache.hadoop.mapred.JobTracker: Adding
>> > task (MAP) 'attempt_201307080351_0001_m_000001_0' to tip
>> > task_201307080351_0001_m_000001, for tracker
>> > 'tracker_vega7247:localhost/127.0.0.1:43320'
>> > 2013-07-08 03:55:34,475 INFO org.apache.hadoop.mapred.JobInProgress:
>> > Choosing data-local task task_201307080351_0001_m_000001
>> >
>> > Version Info:
>> > ceph version 0.61.4
>> > hadoop 1.1.2
>> >
>> > Has anyone else run into this ?
>> >
>> > Thanks
>> > KC
>> >
>> > _______________________________________________
>> > ceph-users mailing list
>> > ceph-users@xxxxxxxxxxxxxx
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com