Re: monitoring performance

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Just to check if my understanding is correct:
On the JobTracker node, ceph computes the locations of the blocks of the input files. The jobtracker then tries to schedule the mappers to run on these particular nodes.

Thanks
Varun


On Wed, Apr 3, 2013 at 11:50 PM, Noah Watkins <noah.watkins@xxxxxxxxxxx> wrote:
On Tue, Apr 2, 2013 at 4:18 AM, Varun Chandramouli <varun.c37@xxxxxxxxx> wrote:

Another question I had was regarding hadoop-MR on ceph. I believe that on HDFS, the jobtracker tries to schedule jobs locally, with necessary information from the namenode. When on ceph, how is this ensured, given that a file may be divided into multiple objects, which may be on different OSDs. Does the jobtracker get the locations of the objects from the mds, and schedule the jobs locally?

Like HDFS, the Ceph implementation of the Hadoop file system interface exposes the location of file blocks (or objects in the case of Ceph), such as the name and rack of the OSDs that store the block locally.

Unlike HDFS, Ceph does not rely on expensive communication with the MDS to retrieve block lists. Ceph computes the location local to the client. This can speed-up job starting for large jobs that would otherwise suffer the expensive latency of MDS roundtrips.

-Noah
 



_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux