Re: Hadoop / Ceph and Data locality ?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



hi Noah, okay I think the current version may have a problem haven't figured out where yet. Looking at the log messages and how the data blocks are distributed among the OSDs.

So, the job tracker log had for example this output for the map task for the first split/block 0 – which it’s executing on host vega7250.

....

....

2013-07-08 11:19:54,836 INFO org.apache.hadoop.mapred.JobTracker: Adding task (MAP) 'attempt_201307081115_0001_m_000000_0' to tip task_201307081115_0001_m_000000, for tracker 'tracker_vega7250:localhost/127.0.0.1:35422'

...

...


If I look at how the blocks are divided up among the OSDs, block 0 for example is managed by OSD#2 – which is running on host vega7249. However our map task for block 0 is running on another host.  Definitely not co-located.

 

   FILE OFFSET                    OBJECT                    OFFSET   LENGTH       OSD

                        0      10000000dbe.00000000             0      67108864         2

       67108864      10000000dbe.00000001              0      67108864      13

      134217728     10000000dbe.00000002             0      67108864         5

      201326592     10000000dbe.00000003             0      67108864        4

….

….

 

Ceph osd tree:

 # id    weight  type name       up/down reweight

-1      14      root default

-3      14              rack unknownrack

-2      7                       host vega7249

0       1                               osd.0   up      1

1       1                               osd.1   up      1

2       1                               osd.2   up      1

3       1                               osd.3   up      1

4       1                               osd.4   up      1

5       1                               osd.5   up      1

6       1                               osd.6   up      1

-4      7                       host vega7250

10      1                               osd.10  up      1

11      1                               osd.11  up      1

12      1                               osd.12  up      1

13      1                               osd.13  up      1

7       1                               osd.7   up      1

8       1                               osd.8   up      1

9       1                               osd.9   up      1


Thanks
KC


On Mon, Jul 8, 2013 at 3:36 PM, Noah Watkins <noah.watkins@xxxxxxxxxxx> wrote:
Yes, all of the code needed to get the locality information should be
present the version of the jar file you referenced. We have tested a
to make sure the right data is available, but have not extensively
tested that it is being used correctly by core Hadoop (e.g. that is
being correctly propagated out of CephFileSystem). IIRC fixing this
/should/ be pretty easy; fiddling with getFileBlockLocation.

On Mon, Jul 8, 2013 at 1:25 PM, ker can <kercan74@xxxxxxxxx> wrote:
> Hi Noah,
>
> I'm using the CephFS jar from ...
> http://ceph.com/download/hadoop-cephfs.jar
> I beleive this is built from hadoop-common/cephfs/branch-1.0 ?
>
> If thats the case, I should already be using an implementation thats got
> getFileBlockLocations() ... which is here
> https://github.com/ceph/hadoop-common/blob/cephfs/branch-1.0/src/core/org/apache/hadoop/fs/ceph/CephFileSystem.java
>
> Is there a command line tool that I can use to verify the results from
> getFileBlockLocations() ?
>
> thanks
> KC
>
>
>
> On Mon, Jul 8, 2013 at 3:09 PM, Noah Watkins <noah.watkins@xxxxxxxxxxx>
> wrote:
>>
>> Hi KC,
>>
>> The locality information is now collected and available to Hadoop
>> through the CephFS API, so fixing this is certainly possible. However,
>> there has not been extensive testing. I think the tasks that need to
>> be completed are (1) make sure that `CephFileSystem` is encoding the
>> correct block location in `getFileBlockLocations` (which I think it is
>> currently completed, but does need to be verified), and (2) make sure
>> rack information is available in the jobtracker, or optionally use a
>> flat hierarchy (i.e. default-rack).
>>
>> On Mon, Jul 8, 2013 at 12:47 PM, ker can <kercan74@xxxxxxxxx> wrote:
>> > Hi There,
>> >
>> > I'm test driving Hadoop with CephFS as the storage layer. I was running
>> > the
>> > Terasort benchmark and  I noticed a lot of network IO activity when
>> > compared
>> > to a HDFS storage layer setup. (Its a half-a-terabyte sort workload over
>> > two
>> > data nodes.)
>> >
>> > Digging into the job tracker logs a little, I noticed that all the map
>> > tasks
>> > were being assigned to process a split (block)  on non-local nodes
>> > (which
>> > explains all the network activity during the map phase)
>> >
>> > With Ceph:
>> >
>> >
>> > 2013-07-08 11:19:53,535 INFO org.apache.hadoop.mapred.JobInProgress:
>> > Input
>> > size for job job_201307081115_0001 = 500000000000. Number of splits =
>> > 7452
>> > 2013-07-08 11:19:53,538 INFO org.apache.hadoop.mapred.JobInProgress: Job
>> > job_201307081115_0001 initialized successfully with 7452 map tasks and
>> > 32
>> > reduce tasks.
>> >
>> > 2013-07-08 11:19:54,836 INFO org.apache.hadoop.mapred.JobInProgress:
>> > Choosing a non-local task task_201307081115_0001_m_000000
>> > 2013-07-08 11:19:54,836 INFO org.apache.hadoop.mapred.JobTracker: Adding
>> > task (MAP) 'attempt_201307081115_0001_m_000000_0' to tip
>> > task_201307081115_0001_m_000000, for tracker
>> > 'tracker_vega7250:localhost/127.0.0.1:35422'
>> >
>> > 2013-07-08 11:19:54,990 INFO org.apache.hadoop.mapred.JobInProgress:
>> > Choosing a non-local task task_201307081115_0001_m_000001
>> > 2013-07-08 11:19:54,990 INFO org.apache.hadoop.mapred.JobTracker: Adding
>> > task (MAP) 'attempt_201307081115_0001_m_000001_0' to tip
>> > task_201307081115_0001_m_000001, for tracker
>> > 'tracker_vega7249:localhost/127.0.0.1:36725'
>> >
>> > ... and so on.
>> >
>> > In comparison with HDFS, the job tracker logs looked something like
>> > this.
>> > The maps tasks were being assigned to process data blocks on the local
>> > nodes.
>> >
>> > 2013-07-08 03:55:32,656 INFO org.apache.hadoop.mapred.JobInProgress:
>> > Input
>> > size for job job_201307080351_0001 = 500000000000. Number of splits =
>> > 7452
>> > 2013-07-08 03:55:32,657 INFO org.apache.hadoop.mapred.JobInProgress:
>> > tip:task_201307080351_0001_m_000000 has split on
>> > node:/default-rack/vega7247
>> > 2013-07-08 03:55:32,657 INFO org.apache.hadoop.mapred.JobInProgress:
>> > tip:task_201307080351_0001_m_000001 has split on
>> > node:/default-rack/vega7247
>> > 2013-07-08 03:55:34,474 INFO org.apache.hadoop.mapred.JobTracker: Adding
>> > task (MAP) 'attempt_201307080351_0001_m_000000_0' to tip
>> > task_201307080351_0001_m_000000, for tracker
>> > 'tracker_vega7247:localhost/127.0.0.1:43320'
>> > 2013-07-08 03:55:34,475 INFO org.apache.hadoop.mapred.JobInProgress:
>> > Choosing data-local task task_201307080351_0001_m_000000
>> > 2013-07-08 03:55:34,475 INFO org.apache.hadoop.mapred.JobTracker: Adding
>> > task (MAP) 'attempt_201307080351_0001_m_000001_0' to tip
>> > task_201307080351_0001_m_000001, for tracker
>> > 'tracker_vega7247:localhost/127.0.0.1:43320'
>> > 2013-07-08 03:55:34,475 INFO org.apache.hadoop.mapred.JobInProgress:
>> > Choosing data-local task task_201307080351_0001_m_000001
>> >
>> > Version Info:
>> > ceph version 0.61.4
>> > hadoop 1.1.2
>> >
>> > Has anyone else run into this ?
>> >
>> > Thanks
>> > KC
>> >
>> > _______________________________________________
>> > ceph-users mailing list
>> > ceph-users@xxxxxxxxxxxxxx
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>
>

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux