Re: Hadoop/Ceph and DFS IO tests

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




by the way ... here's the log of the write.

13/07/09 05:52:56 INFO fs.TestDFSIO: ----- TestDFSIO ----- : write (HDFS)
13/07/09 05:52:56 INFO fs.TestDFSIO:            Date & time: Tue Jul 09 05:52:56 PDT 2013
13/07/09 05:52:56 INFO fs.TestDFSIO:        Number of files: 300
13/07/09 05:52:56 INFO fs.TestDFSIO: Total MBytes processed: 460800
13/07/09 05:52:56 INFO fs.TestDFSIO:      Throughput mb/sec: 50.43823691216413
13/07/09 05:52:56 INFO fs.TestDFSIO: Average IO rate mb/sec: 52.558677673339844
13/07/09 05:52:56 INFO fs.TestDFSIO:  IO rate std deviation: 12.838500708755591
13/07/09 05:52:56 INFO fs.TestDFSIO:     Test exec time sec: 227.571
13/07/09 05:52:56 INFO fs.TestDFSIO:

13/07/09 13:22:09 INFO fs.TestDFSIO: ----- TestDFSIO ----- : write (Ceph)
13/07/09 13:22:09 INFO fs.TestDFSIO:            Date & time: Tue Jul 09 13:22:09 PDT 2013
13/07/09 13:22:09 INFO fs.TestDFSIO:        Number of files: 300
13/07/09 13:22:09 INFO fs.TestDFSIO: Total MBytes processed: 460800
13/07/09 13:22:09 INFO fs.TestDFSIO:      Throughput mb/sec: 23.40132226611945
13/07/09 13:22:09 INFO fs.TestDFSIO: Average IO rate mb/sec: 24.76653480529785
13/07/09 13:22:09 INFO fs.TestDFSIO:  IO rate std deviation: 6.141010947451576
13/07/09 13:22:09 INFO fs.TestDFSIO:     Test exec time sec: 510.087
13/07/09 13:22:09 INFO fs.TestDFSIO:

In one of the older archive posts [ http://www.spinics.net/lists/ceph-devel/msg05387.html ] from last year I saw a similar discussion of TestDFSIO performance ceph versus hdfs.
I saw mention of  "one reason you might be seeing throughput issues is with the standard read/write interface that copies bytes across the JNI interface. On the short list of stuff for the next Java wrapper set is to use the ByteBuffer interface (NIO) to avoid this copying"

Is the JNI interface still an issue or have we moved past that ?

thanks !




On Tue, Jul 9, 2013 at 3:01 PM, ker can <kercan74@xxxxxxxxx> wrote:
For this particular test I turned off replication for both hdfs and ceph. So there is just one copy of the data lying around.

hadoop@vega7250:~$ ceph osd dump | grep rep
pool 0 'data' rep size 1 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 960 pgp_num 960 last_change 26 owner 0 crash_replay_interval 45
pool 1 'metadata' rep size 2 min_size 1 crush_ruleset 1 object_hash rjenkins pg_num 960 pgp_num 960 last_change 1 owner 0
pool 2 'rbd' rep size 2 min_size 1 crush_ruleset 2 object_hash rjenkins pg_num 960 pgp_num 960 last_change 1 owner 0

From hdfs-site.xml:

  <property>
    <name>dfs.replication</name>
    <value>1</value>
  </property>





On Tue, Jul 9, 2013 at 2:44 PM, Noah Watkins <noah.watkins@xxxxxxxxxxx> wrote:
On Tue, Jul 9, 2013 at 12:35 PM, ker can <kercan74@xxxxxxxxx> wrote:
> hi Noah,
>
> while we're still on the hadoop topic ... I was also trying out the
> TestDFSIO tests ceph v/s hadoop.  The Read tests on ceph takes about 1.5x
> the hdfs time.  The write tests are worse about ... 2.5x the time on hdfs,
> but I guess we have additional journaling overheads for the writes on ceph.
> But there should be no such overheads for the read  ?

Out of the box Hadoop will keep 3 copies, and Ceph 2, so it could be
the case that reads are slower because there is less opportunity for
scheduling local reads. You can create a new pool with replication=3
and test this out (documentation on how to do this is on
http://ceph.com/docs/wip-hadoop-doc/cephfs/hadoop/).

As for writes, Hadoop will write 2 remote and 1 local blocks, however
Ceph will write all copies remotely, so there is some overhead for the
extra remote object write  (compared to Hadoop), but i wouldn't have
expected 2.5x. It might be useful to run dd or something like that on
Ceph to see if the numbers make sense to rule out Hadoop as the
bottleneck.

-Noah


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux