On Tue, Jul 9, 2013 at 12:35 PM, ker can <kercan74@xxxxxxxxx> wrote: > hi Noah, > > while we're still on the hadoop topic ... I was also trying out the > TestDFSIO tests ceph v/s hadoop. The Read tests on ceph takes about 1.5x > the hdfs time. The write tests are worse about ... 2.5x the time on hdfs, > but I guess we have additional journaling overheads for the writes on ceph. > But there should be no such overheads for the read ? Out of the box Hadoop will keep 3 copies, and Ceph 2, so it could be the case that reads are slower because there is less opportunity for scheduling local reads. You can create a new pool with replication=3 and test this out (documentation on how to do this is on http://ceph.com/docs/wip-hadoop-doc/cephfs/hadoop/). As for writes, Hadoop will write 2 remote and 1 local blocks, however Ceph will write all copies remotely, so there is some overhead for the extra remote object write (compared to Hadoop), but i wouldn't have expected 2.5x. It might be useful to run dd or something like that on Ceph to see if the numbers make sense to rule out Hadoop as the bottleneck. -Noah _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com