On Mon, 16 Jan 2012, Andrey Stepachev wrote: > Ops. It is really a buffer problem. > It can be checked with easy with ceph osd tell 4 bench > > bench: wrote 1024 MB in blocks of 15625 KB in 17.115538 sec at 61264 KB/sec > bench: wrote 1024 MB in blocks of 122 MB in 12.281531 sec at 85378 KB/sec > bench: wrote 1024 MB in blocks of 244 MB in 13.529501 sec at 77502 KB/sec > bench: wrote 3814 MB in blocks of 488 MB in 30.909198 sec at 123 MB/sec > > and in last case dstat show 'iozone-like iops' > > 100: 100: 100: 100| 0 0 94 6 0 0| 0 538M| 0 238 > 100: 100: 100: 100| 0 0 96 3 0 0| 0 525M| 0 133 > 100: 100: 100: 100| 0 2 95 3 0 0| 0 497M| 0 128 > 18.0:39.0:30.0:27.0| 0 3 96 2 0 0| 0 144M| 0 40.0 > 75.0:74.0:72.0:67.0| 0 3 95 2 0 0| 0 103M| 0 2698 > 100: 100: 100: 100| 0 13 83 4 0 0| 0 484M| 0 125 > 100: 100: 100: 100| 0 0 100 0 0 0| 0 486M| 0 124 > 100: 100: 100: 100| 0 3 88 9 0 0| 0 476M| 0 123 > > Now question arises: how ceph can be tuned to gain such > performance in normal operations, not in bench? This may be related to how your OSD journaling is configured. I'm guessing it's set to a file inside the btrfs volume holding the data? sage > > 2012/1/16 Andrey Stepachev <octo47@xxxxxxxxx>: > > Hi all. > > > > Last week I've investigate the status for hadoop on ceph. > > I create some patches to remove some bugs and crashes. > > Looks like it works. Even hbase works on top. > > > > For reference all sources and patches are here > > > > https://github.com/octo47/hadoop-common/tree/branch-1.0-ceph > > https://github.com/octo47/ceph/tree/v0.40-hadoop > > > > After YCSB and TestDSFIO work without crashes i start investigate > > performance. > > > > I have 5node cluster with 4 sata disks. btrfs. 24core on each. > > raid. iozone shows up to 520MB/s. > > > > Performance differs in 2-3 times. After some tests i see strange thing. > > hadoop uses disk very close to iozone: small amount and iops and high > > throughtput (same as iozone). > > ceph uses very inefficient: huge amount of iops, up to 3 times less > > throughtput (i think because of high amount of iops). > > > > hadoop dstat output: > > sda--sdb--sdc--sdd- ----total-cpu-usage---- -dsk/total- --io/total- > > util:util:util:util|usr sys idl wai hiq siq| read writ| read writ > > 100: 100: 100: 100| 1 5 83 11 0 0| 0 529M| 0 247 > > 100: 100: 100: 100| 1 0 83 16 0 0| 0 542M| 0 168 > > 100: 100: 100: 100| 1 0 81 18 0 0| 28k 518M|6.00 149 > > 100: 100: 100: 100| 1 4 77 17 0 0| 0 533M| 0 243 > > 100: 100: 100: 100| 1 3 83 13 0 0| 0 523M| 0 264 > > > > ceph dstat output: > > =================================================== > > sda--sdb--sdc--sdd- ----total-cpu-usage---- -dsk/total- --io/total- > > util:util:util:util|usr sys idl wai hiq siq| read writ| read writ > > 68.0:70.0:79.0:76.0| 1 2 93 4 0 0| 0 195M| 0 1723 > > 86.0:85.0:93.0:91.0| 1 2 91 5 0 0| 0 226M| 0 1816 > > 85.0:85.0:85.0:84.0| 1 3 92 4 0 0| 0 235M| 0 2316 > > > > > > So, my question is: can someone point me: > > a) can it be because of inefficient buffer size on osd part > > (i tried to increase CephOutputStream buffer to 256kb, not helps) > > b) what other problems can be and what options can i tune > > to find out what is going on. > > > > PS: i can't use iozone on kernel mounted fs. something > > hang in kernel, only reboot helps. > > in /var/log/messages i see attached kern.log. > > > > > > > > -- > > Andrey. > > > > -- > Andrey. > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > >