So, the job tracker log had for example this output for the map task for the first split/block 0 – which it’s executing on host vega7250.
....
....
2013-07-08 11:19:54,836 INFO org.apache.hadoop.mapred.JobTracker: Adding task (MAP) 'attempt_201307081115_0001_m_000000_0' to tip task_201307081115_0001_m_000000, for tracker 'tracker_vega7250:localhost/127.0.0.1:35422'
...
...
If I look at how the blocks are divided up among the OSDs, block 0 for example is managed by OSD#2 – which is running on host vega7249. However our map task for block 0 is running on another host. Definitely not co-located.
FILE OFFSET OBJECT OFFSET LENGTH OSD
0 10000000dbe.00000000 0 67108864 2
67108864 10000000dbe.00000001 0 67108864 13
134217728 10000000dbe.00000002 0 67108864 5
201326592 10000000dbe.00000003 0 67108864 4
….
….
Ceph osd tree:
# id weight type name up/down reweight
-1 14 root default
-3 14 rack unknownrack
-2 7 host vega7249
0 1 osd.0 up 1
1 1 osd.1 up 1
2 1 osd.2 up 1
3 1 osd.3 up 1
4 1 osd.4 up 1
5 1 osd.5 up 1
6 1 osd.6 up 1
-4 7 host vega7250
10 1 osd.10 up 1
11 1 osd.11 up 1
12 1 osd.12 up 1
13 1 osd.13 up 1
7 1 osd.7 up 1
8 1 osd.8 up 1
9 1 osd.9 up 1
Yes, all of the code needed to get the locality information should be
present the version of the jar file you referenced. We have tested a
to make sure the right data is available, but have not extensively
tested that it is being used correctly by core Hadoop (e.g. that is
being correctly propagated out of CephFileSystem). IIRC fixing this
/should/ be pretty easy; fiddling with getFileBlockLocation.
On Mon, Jul 8, 2013 at 1:25 PM, ker can <kercan74@xxxxxxxxx> wrote:
> Hi Noah,
>
> I'm using the CephFS jar from ...
> http://ceph.com/download/hadoop-cephfs.jar
> I beleive this is built from hadoop-common/cephfs/branch-1.0 ?
>
> If thats the case, I should already be using an implementation thats got
> getFileBlockLocations() ... which is here
> https://github.com/ceph/hadoop-common/blob/cephfs/branch-1.0/src/core/org/apache/hadoop/fs/ceph/CephFileSystem.java
>
> Is there a command line tool that I can use to verify the results from
> getFileBlockLocations() ?
>
> thanks
> KC
>
>
>
> On Mon, Jul 8, 2013 at 3:09 PM, Noah Watkins <noah.watkins@xxxxxxxxxxx>
> wrote:
>>
>> Hi KC,
>>
>> The locality information is now collected and available to Hadoop
>> through the CephFS API, so fixing this is certainly possible. However,
>> there has not been extensive testing. I think the tasks that need to
>> be completed are (1) make sure that `CephFileSystem` is encoding the
>> correct block location in `getFileBlockLocations` (which I think it is
>> currently completed, but does need to be verified), and (2) make sure
>> rack information is available in the jobtracker, or optionally use a
>> flat hierarchy (i.e. default-rack).
>>
>> On Mon, Jul 8, 2013 at 12:47 PM, ker can <kercan74@xxxxxxxxx> wrote:
>> > Hi There,
>> >
>> > I'm test driving Hadoop with CephFS as the storage layer. I was running
>> > the
>> > Terasort benchmark and I noticed a lot of network IO activity when
>> > compared
>> > to a HDFS storage layer setup. (Its a half-a-terabyte sort workload over
>> > two
>> > data nodes.)
>> >
>> > Digging into the job tracker logs a little, I noticed that all the map
>> > tasks
>> > were being assigned to process a split (block) on non-local nodes
>> > (which
>> > explains all the network activity during the map phase)
>> >
>> > With Ceph:
>> >
>> >
>> > 2013-07-08 11:19:53,535 INFO org.apache.hadoop.mapred.JobInProgress:
>> > Input
>> > size for job job_201307081115_0001 = 500000000000. Number of splits =
>> > 7452
>> > 2013-07-08 11:19:53,538 INFO org.apache.hadoop.mapred.JobInProgress: Job
>> > job_201307081115_0001 initialized successfully with 7452 map tasks and
>> > 32
>> > reduce tasks.
>> >
>> > 2013-07-08 11:19:54,836 INFO org.apache.hadoop.mapred.JobInProgress:
>> > Choosing a non-local task task_201307081115_0001_m_000000
>> > 2013-07-08 11:19:54,836 INFO org.apache.hadoop.mapred.JobTracker: Adding
>> > task (MAP) 'attempt_201307081115_0001_m_000000_0' to tip
>> > task_201307081115_0001_m_000000, for tracker
>> > 'tracker_vega7250:localhost/127.0.0.1:35422'
>> >
>> > 2013-07-08 11:19:54,990 INFO org.apache.hadoop.mapred.JobInProgress:
>> > Choosing a non-local task task_201307081115_0001_m_000001
>> > 2013-07-08 11:19:54,990 INFO org.apache.hadoop.mapred.JobTracker: Adding
>> > task (MAP) 'attempt_201307081115_0001_m_000001_0' to tip
>> > task_201307081115_0001_m_000001, for tracker
>> > 'tracker_vega7249:localhost/127.0.0.1:36725'
>> >
>> > ... and so on.
>> >
>> > In comparison with HDFS, the job tracker logs looked something like
>> > this.
>> > The maps tasks were being assigned to process data blocks on the local
>> > nodes.
>> >
>> > 2013-07-08 03:55:32,656 INFO org.apache.hadoop.mapred.JobInProgress:
>> > Input
>> > size for job job_201307080351_0001 = 500000000000. Number of splits =
>> > 7452
>> > 2013-07-08 03:55:32,657 INFO org.apache.hadoop.mapred.JobInProgress:
>> > tip:task_201307080351_0001_m_000000 has split on
>> > node:/default-rack/vega7247
>> > 2013-07-08 03:55:32,657 INFO org.apache.hadoop.mapred.JobInProgress:
>> > tip:task_201307080351_0001_m_000001 has split on
>> > node:/default-rack/vega7247
>> > 2013-07-08 03:55:34,474 INFO org.apache.hadoop.mapred.JobTracker: Adding
>> > task (MAP) 'attempt_201307080351_0001_m_000000_0' to tip
>> > task_201307080351_0001_m_000000, for tracker
>> > 'tracker_vega7247:localhost/127.0.0.1:43320'
>> > 2013-07-08 03:55:34,475 INFO org.apache.hadoop.mapred.JobInProgress:
>> > Choosing data-local task task_201307080351_0001_m_000000
>> > 2013-07-08 03:55:34,475 INFO org.apache.hadoop.mapred.JobTracker: Adding
>> > task (MAP) 'attempt_201307080351_0001_m_000001_0' to tip
>> > task_201307080351_0001_m_000001, for tracker
>> > 'tracker_vega7247:localhost/127.0.0.1:43320'
>> > 2013-07-08 03:55:34,475 INFO org.apache.hadoop.mapred.JobInProgress:
>> > Choosing data-local task task_201307080351_0001_m_000001
>> >
>> > Version Info:
>> > ceph version 0.61.4
>> > hadoop 1.1.2
>> >
>> > Has anyone else run into this ?
>> >
>> > Thanks
>> > KC
>> >
>> > _______________________________________________
>> > ceph-users mailing list
>> > ceph-users@xxxxxxxxxxxxxx
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>
>
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com