Re: monitoring performance

Noah Watkins <noah.watkins@xxxxxxxxxxx> · Sun, 7 Apr 2013 21:54:05 -0700

Yup
On Apr 7, 2013 9:49 PM, "Varun Chandramouli" <varun.c37@xxxxxxxxx> wrote:

Just to check if my understanding is correct:On the JobTracker node, ceph computes the locations of the blocks of the input files. The jobtracker then tries to schedule the mappers to run on these particular nodes.

Thanks
Varun

On Wed, Apr 3, 2013 at 11:50 PM, Noah Watkins <noah.watkins@xxxxxxxxxxx> wrote:

On Tue, Apr 2, 2013 at 4:18 AM, Varun Chandramouli <varun.c37@xxxxxxxxx> wrote:

Another question I had was regarding hadoop-MR on ceph. I believe that on HDFS, the jobtracker tries to schedule jobs locally, with necessary information from the namenode. When on ceph, how is this ensured, given that a file may be divided into multiple objects, which may be on different OSDs. Does the jobtracker get the locations of the objects from the mds, and schedule the jobs locally?

Like HDFS, the Ceph implementation of the Hadoop file system interface exposes the location of file blocks (or objects in the case of Ceph), such as the name and rack of the OSDs that store the block locally.

Unlike HDFS, Ceph does not rely on expensive communication with the MDS to retrieve block lists. Ceph computes the location local to the client. This can speed-up job starting for large jobs that would otherwise suffer the expensive latency of MDS roundtrips.

-Noah

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com