Re: Slow ceph io. High iops. Compared to hadoop.

Sage Weil <sage@xxxxxxxxxxxx> · Tue, 17 Jan 2012 08:44:00 -0800 (PST)

On Mon, 16 Jan 2012, Andrey Stepachev wrote:
> Ops. It is really a buffer problem.
> It can be checked with easy with ceph osd tell 4 bench
> 
> bench: wrote 1024 MB in blocks of 15625 KB in 17.115538 sec at 61264 KB/sec
> bench: wrote 1024 MB in blocks of 122 MB in 12.281531 sec at 85378 KB/sec
> bench: wrote 1024 MB in blocks of 244 MB in 13.529501 sec at 77502 KB/sec
> bench: wrote 3814 MB in blocks of 488 MB in 30.909198 sec at 123 MB/sec
> 
> and in last case dstat show 'iozone-like iops'
> 
>  100: 100: 100: 100|  0   0  94   6   0   0|   0   538M|   0   238
>  100: 100: 100: 100|  0   0  96   3   0   0|   0   525M|   0   133
>  100: 100: 100: 100|  0   2  95   3   0   0|   0   497M|   0   128
> 18.0:39.0:30.0:27.0|  0   3  96   2   0   0|   0   144M|   0  40.0
> 75.0:74.0:72.0:67.0|  0   3  95   2   0   0|   0   103M|   0  2698
>  100: 100: 100: 100|  0  13  83   4   0   0|   0   484M|   0   125
>  100: 100: 100: 100|  0   0 100   0   0   0|   0   486M|   0   124
>  100: 100: 100: 100|  0   3  88   9   0   0|   0   476M|   0   123
> 
> Now question arises: how ceph can be tuned to gain such
> performance in normal operations, not in bench?

This may be related to how your OSD journaling is configured.  I'm 
guessing it's set to a file inside the btrfs volume holding the data?

sage

> 
> 2012/1/16 Andrey Stepachev <octo47@xxxxxxxxx>:
> > Hi all.
> >
> > Last week I've investigate the status for hadoop on ceph.
> > I create some patches to remove some bugs and crashes.
> > Looks like it works. Even hbase works on top.
> >
> > For reference all sources and patches are here
> >
> > https://github.com/octo47/hadoop-common/tree/branch-1.0-ceph
> > https://github.com/octo47/ceph/tree/v0.40-hadoop
> >
> > After YCSB and TestDSFIO work without crashes i start investigate
> > performance.
> >
> > I have 5node cluster with 4 sata disks. btrfs. 24core on each.
> > raid. iozone shows up to 520MB/s.
> >
> > Performance differs in 2-3 times. After some tests i see strange thing.
> > hadoop uses disk very close to iozone: small amount and iops and high
> > throughtput (same as iozone).
> > ceph uses very inefficient: huge amount of iops, up to 3 times less
> > throughtput (i think because of high amount of iops).
> >
> > hadoop dstat output:
> > sda--sdb--sdc--sdd- ----total-cpu-usage---- -dsk/total- --io/total-
> > util:util:util:util|usr sys idl wai hiq siq| read  writ| read  writ
> >  100: 100: 100: 100|  1   5  83  11   0   0|   0   529M|   0   247
> >  100: 100: 100: 100|  1   0  83  16   0   0|   0   542M|   0   168
> >  100: 100: 100: 100|  1   0  81  18   0   0|  28k  518M|6.00   149
> >  100: 100: 100: 100|  1   4  77  17   0   0|   0   533M|   0   243
> >  100: 100: 100: 100|  1   3  83  13   0   0|   0   523M|   0   264
> >
> > ceph dstat output:
> > ===================================================
> > sda--sdb--sdc--sdd- ----total-cpu-usage---- -dsk/total- --io/total-
> > util:util:util:util|usr sys idl wai hiq siq| read  writ| read  writ
> > 68.0:70.0:79.0:76.0|  1   2  93   4   0   0|   0   195M|   0  1723
> > 86.0:85.0:93.0:91.0|  1   2  91   5   0   0|   0   226M|   0  1816
> > 85.0:85.0:85.0:84.0|  1   3  92   4   0   0|   0   235M|   0  2316
> >
> >
> > So, my question is: can someone point me:
> > a) can it be because of inefficient buffer size on osd part
> > (i tried to increase CephOutputStream buffer to 256kb, not helps)
> > b) what other problems can be and what options can i tune
> > to find out what is going on.
> >
> > PS: i can't use iozone on kernel mounted fs. something
> > hang in kernel, only reboot helps.
> > in /var/log/messages i see attached kern.log.
> >
> >
> >
> > --
> > Andrey.
> 
> 
> 
> -- 
> Andrey.
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
>