And here is perfcounters, a few seconds after ceph starts write data (disk util goes up in dstat) 2012/1/16 Andrey Stepachev <octo47@xxxxxxxxx>: > Hi all. > > Last week I've investigate the status for hadoop on ceph. > I create some patches to remove some bugs and crashes. > Looks like it works. Even hbase works on top. > > For reference all sources and patches are here > > https://github.com/octo47/hadoop-common/tree/branch-1.0-ceph > https://github.com/octo47/ceph/tree/v0.40-hadoop > > After YCSB and TestDSFIO work without crashes i start investigate > performance. > > I have 5node cluster with 4 sata disks. btrfs. 24core on each. > raid. iozone shows up to 520MB/s. > > Performance differs in 2-3 times. After some tests i see strange thing. > hadoop uses disk very close to iozone: small amount and iops and high > throughtput (same as iozone). > ceph uses very inefficient: huge amount of iops, up to 3 times less > throughtput (i think because of high amount of iops). > > hadoop dstat output: > sda--sdb--sdc--sdd- ----total-cpu-usage---- -dsk/total- --io/total- > util:util:util:util|usr sys idl wai hiq siq| read writ| read writ > 100: 100: 100: 100| 1 5 83 11 0 0| 0 529M| 0 247 > 100: 100: 100: 100| 1 0 83 16 0 0| 0 542M| 0 168 > 100: 100: 100: 100| 1 0 81 18 0 0| 28k 518M|6.00 149 > 100: 100: 100: 100| 1 4 77 17 0 0| 0 533M| 0 243 > 100: 100: 100: 100| 1 3 83 13 0 0| 0 523M| 0 264 > > ceph dstat output: > =================================================== > sda--sdb--sdc--sdd- ----total-cpu-usage---- -dsk/total- --io/total- > util:util:util:util|usr sys idl wai hiq siq| read writ| read writ > 68.0:70.0:79.0:76.0| 1 2 93 4 0 0| 0 195M| 0 1723 > 86.0:85.0:93.0:91.0| 1 2 91 5 0 0| 0 226M| 0 1816 > 85.0:85.0:85.0:84.0| 1 3 92 4 0 0| 0 235M| 0 2316 > > > So, my question is: can someone point me: > a) can it be because of inefficient buffer size on osd part > (i tried to increase CephOutputStream buffer to 256kb, not helps) > b) what other problems can be and what options can i tune > to find out what is going on. > > PS: i can't use iozone on kernel mounted fs. something > hang in kernel, only reboot helps. > in /var/log/messages i see attached kern.log. > > > > -- > Andrey. -- Andrey.
{ "filestore" : { "apply_latency" : { "avgcount" : 3752, "sum" : 107.00700000000001 }, "bytes" : 1084026405, "commitcycle" : 10, "commitcycle_interval" : { "avgcount" : 10, "sum" : 57.902500000000003 }, "commitcycle_latency" : { "avgcount" : 10, "sum" : 7.89201 }, "committing" : 0, "journal_bytes" : 974956029, "journal_full" : 0, "journal_latency" : { "avgcount" : 3739, "sum" : 1361.1199999999999 }, "journal_ops" : 3739, "journal_queue_bytes" : 109070376, "journal_queue_max_bytes" : 104857600, "journal_queue_max_ops" : 500, "journal_queue_ops" : 13, "op_queue_bytes" : 26742974, "op_queue_max_bytes" : 104857600, "op_queue_max_ops" : 500, "op_queue_ops" : 3, "ops" : 3752 }, "osd" : { "buffer_bytes" : 0, "heartbeat_from_peers" : 4, "heartbeat_to_peers" : 4, "loadavg" : 0.41999999999999998, "map_message_epoch_dups" : 10, "map_message_epochs" : 14, "map_messages" : 11, "numpg" : 625, "numpg_primary" : 209, "numpg_replica" : 416, "numpg_stray" : 0, "op" : 137, "op_in_bytes" : 160997316, "op_latency" : { "avgcount" : 137, "sum" : 190.98699999999999 }, "op_out_bytes" : 26871, "op_r" : 9, "op_r_latency" : { "avgcount" : 9, "sum" : 3.02433 }, "op_r_out_bytes" : 26871, "op_rw" : 0, "op_rw_in_bytes" : 0, "op_rw_latency" : { "avgcount" : 0, "sum" : 0 }, "op_rw_out_bytes" : 0, "op_rw_rlat" : { "avgcount" : 0, "sum" : 0 }, "op_w" : 128, "op_w_in_bytes" : 160997316, "op_w_latency" : { "avgcount" : 128, "sum" : 187.96299999999999 }, "op_w_rlat" : { "avgcount" : 128, "sum" : 75.012200000000007 }, "op_wip" : 5, "opq" : 7, "pull" : 0, "push" : 0, "push_out_bytes" : 0, "recovery_ops" : 0, "subop" : 334, "subop_in_bytes" : 735611145, "subop_latency" : { "avgcount" : 334, "sum" : 238.98599999999999 }, "subop_pull" : 0, "subop_pull_latency" : { "avgcount" : 0, "sum" : 0 }, "subop_push" : 0, "subop_push_in_bytes" : 0, "subop_push_latency" : { "avgcount" : 0, "sum" : 0 }, "subop_w" : 0, "subop_w_in_bytes" : 735611145, "subop_w_latency" : { "avgcount" : 334, "sum" : 238.98599999999999 } } }