Re: Ceph benchmark high wait on journal device

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 15 Oct 2012, Travis Rhoden wrote:
> Martin,
> 
> > btw.
> >
> > Is there a nice way to format the output of ceph --admin-daemon
> > ceph-osd.0.asok perf_dump?
> 
> I use:
> 
> ceph --admin-daemon /var/run/ceph/ceph-osd.3.asok perf dump | python -mjson.tool

There is also ceph.git/src/script/perf-watch.py -s foo.asok <list of vars or var prefixes>

s
> 
>  - Travis
> 
> 
> On Mon, Oct 15, 2012 at 4:38 PM, Martin Mailand <martin@xxxxxxxxxxxx> wrote:
> > Hi Mark,
> >
> > I think there is no differences between the 9266-8i and the 9265-8i, except
> > for the cache vault and the angel of the SAS connectors.
> > In the last test, which I posted, the SSDs where connected to the onboard
> > SATA ports. Further test showed if I reduce the the object size (the -b
> > option) to 1M, 512k, 256k the latency almost vanished.
> > With 256k the w_wait was around 1ms.
> > So my observation shows almost the different of yours.
> >
> > I use a singel controller with a dual expander backplane.
> >
> > That's the baby.
> >
> > http://85.214.49.87/ceph/testlab/IMAG0018.jpg
> >
> > btw.
> >
> > Is there a nice way to format the output of ceph --admin-daemon
> > ceph-osd.0.asok perf_dump?
> >
> >
> > -martin
> >
> > Am 15.10.2012 21:50, schrieb Mark Nelson:
> >>
> >> Hi Martin,
> >>
> >> I haven't tested the 9266-8i specifically, but it may behave similarly
> >> to the 9265-8i.  This is just a theory, but I get the impression that
> >> the controller itself introduces some latency getting data to disk, and
> >> that it may get worse as the more data is pushed across the controller.
> >> That seems to be the case even of the data is not going to the disk in
> >> question.  Are you using a single controller with expanders?  On some of
> >> our nodes that use a single controller with lots of expanders, I've
> >> noticed high IO wait times, especially when doing lots of small writes.
> >>
> >> Mark
> >>
> >> On 10/15/2012 11:12 AM, Martin Mailand wrote:
> >>>
> >>> Hi,
> >>>
> >>> inspired from the performance test Mark did, I tried to compile my own
> >>> one.
> >>> I have four OSD processes on one Node, each process has a Intel 710 SSD
> >>> for its journal and 4 SAS Disk via an Lsi 9266-8i in Raid 0.
> >>> If I test the SSD with fio they are quite fast and the w_wait time is
> >>> quite low.
> >>> But if I run rados bench on the cluster, the w_wait times for the
> >>> journal devices are quite high (around 20-40ms).
> >>> I thought the SSD would be better, any ideas what happend here?
> >>>
> >>> -martin
> >>>
> >>> Logs:
> >>>
> >>> /dev/sd{c,d,e,f}
> >>> Intel SSD 710 200G
> >>>
> >>> /dev/sd{g,h,i,j}
> >>> each 4 x SAS on LSI 9266-8i Raid 0
> >>>
> >>> fio -name iops -rw=write -size=10G -iodepth 1 -filename /dev/sdc2
> >>> -ioengine libaio -direct 1 -bs 256k
> >>>
> >>> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await
> >>> r_await w_await svctm %util
> >>> <- snip ->
> >>> sdc 0,00 0,00 0,00 809,20 0,00 202,30 512,00 0,96 1,19 0,00 1,19 1,18
> >>> 95,84
> >>> <- snap ->
> >>>
> >>>
> >>>
> >>> rados bench -p rbd 300 write -t 16
> >>>
> >>> 2012-10-15 17:53:17.058383min lat: 0.035382 max lat: 0.469604 avg lat:
> >>> 0.189553
> >>> sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
> >>> 300 16 25329 25313 337.443 324 0.274815 0.189553
> >>> Total time run: 300.169843
> >>> Total writes made: 25329
> >>> Write size: 4194304
> >>> Bandwidth (MB/sec): 337.529
> >>>
> >>> Stddev Bandwidth: 25.1568
> >>> Max bandwidth (MB/sec): 372
> >>> Min bandwidth (MB/sec): 0
> >>> Average Latency: 0.189597
> >>> Stddev Latency: 0.0641609
> >>> Max latency: 0.469604
> >>> Min latency: 0.035382
> >>>
> >>>
> >>> during the rados bench test.
> >>>
> >>> avg-cpu: %user %nice %system %iowait %steal %idle
> >>> 20,38 0,00 16,20 8,87 0,00 54,55
> >>>
> >>> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await
> >>> r_await w_await svctm %util
> >>> sda 0,00 41,20 0,00 12,40 0,00 0,35 57,42 0,00 0,31 0,00 0,31 0,31 0,38
> >>> sdb 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00
> >>> sdc 0,00 0,00 0,00 332,80 0,00 139,67 859,53 7,36 22,09 0,00 22,09 2,12
> >>> 70,42
> >>> sdd 0,00 0,00 0,00 391,60 0,00 175,84 919,62 15,59 39,62 0,00 39,62 2,40
> >>> 93,80
> >>> sde 0,00 0,00 0,00 342,00 0,00 147,39 882,59 8,54 24,89 0,00 24,89 2,18
> >>> 74,58
> >>> sdf 0,00 0,00 0,00 362,20 0,00 162,72 920,05 15,35 42,50 0,00 42,50 2,60
> >>> 94,20
> >>> sdg 0,00 0,00 0,00 522,00 0,00 139,20 546,13 0,28 0,54 0,00 0,54 0,10
> >>> 5,26
> >>> sdh 0,00 0,00 0,00 672,00 0,00 179,20 546,13 9,67 14,42 0,00 14,42 0,61
> >>> 41,18
> >>> sdi 0,00 0,00 0,00 555,00 0,00 148,00 546,13 0,32 0,57 0,00 0,57 0,10
> >>> 5,46
> >>> sdj 0,00 0,00 0,00 582,00 0,00 155,20 546,13 0,51 0,87 0,00 0,87 0,12
> >>> 6,96
> >>>
> >>> 100 seconds later
> >>>
> >>> avg-cpu: %user %nice %system %iowait %steal %idle
> >>> 22,92 0,00 19,57 9,25 0,00 48,25
> >>>
> >>> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await
> >>> r_await w_await svctm %util
> >>> sda 0,00 40,80 0,00 15,60 0,00 0,36 47,08 0,00 0,22 0,00 0,22 0,22 0,34
> >>> sdb 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00
> >>> sdc 0,00 0,00 0,00 386,60 0,00 168,33 891,70 12,11 31,08 0,00 31,08 2,25
> >>> 86,86
> >>> sdd 0,00 0,00 0,00 405,00 0,00 183,06 925,68 15,68 38,70 0,00 38,70 2,34
> >>> 94,90
> >>> sde 0,00 0,00 0,00 411,00 0,00 185,06 922,15 15,58 38,09 0,00 38,09 2,33
> >>> 95,92
> >>> sdf 0,00 0,00 0,00 387,00 0,00 168,33 890,79 12,19 31,48 0,00 31,48 2,26
> >>> 87,48
> >>> sdg 0,00 0,00 0,00 646,20 0,00 171,22 542,64 0,42 0,65 0,00 0,65 0,10
> >>> 6,70
> >>> sdh 0,00 85,60 0,40 797,00 0,01 192,97 495,65 10,95 13,73 32,50 13,72
> >>> 0,55 44,22
> >>> sdi 0,00 0,00 0,00 678,20 0,00 180,01 543,59 0,45 0,67 0,00 0,67 0,10
> >>> 6,76
> >>> sdj 0,00 0,00 0,00 639,00 0,00 169,61 543,61 0,36 0,57 0,00 0,57 0,10
> >>> 6,32
> >>>
> >>> --admin-daemon /var/run/ceph/ceph-osd.1.asok perf dump
> >>>
> >>> {"filestore":{"journal_queue_max_ops":500,"journal_queue_ops":0,"journal_ops":34653,"journal_queue_max_bytes":104857600,"journal_queue_bytes":0,"journal_bytes":86821481160,"journal_latency":{"avgcount":34653,"sum":3458.68},"journal_wr":19372,"journal_wr_bytes":{"avgcount":19372,"sum":87026655232},"op_queue_max_ops":500,"op_queue_ops":126,"ops":34653,"op_queue_max_bytes":104857600,"op_queue_bytes":167023,"bytes":86821143225,"apply_latency":{"avgcount":34527,"sum":605.768},"committing":0,"commitcycle":19,"commitcycle_interval":{"avgcount":19,"sum":572.674},"commitcycle_latency":{"avgcount":19,"sum":2.62279},"journal_full":0},"osd":{"opq":0,"op_wip":4,"op":15199,"op_in_bytes":36140461079,"op_out_bytes":0,"op_latency":{"avgcount":15199,"sum":1811.57},"op_r":0,"op_r_out_bytes":0,"op_r_latency":{"avgcount":0,"sum":0},"op_w":15199,"op_w_in_bytes":36140461079,"op_w_rlat":{"avgcount":15199,"sum":177.327},"op_w_latency":{"avgcount":15199,"sum":1811.57},"op_rw":0,"op_rw_in_bytes":
 0,
>  !
> >
> > "op!
> >>>
> >>>
> >>>
> >>> _rw_out_
> >>>
> >>> bytes":0,"op_rw_rlat":{"avgcount":0,"sum":0},"op_rw_latency":{"avgcount":0,"sum":0},"subop":19258,"subop_in_bytes":50617887978,"subop_latency":{"avgcount":19258,"sum":2110.84},"subop_w":0,"subop_w_in_bytes":50617887978,"subop_w_latency":{"avgcount":19258,"sum":2110.84},"subop_pull":0,"subop_pull_latency":{"avgcount":0,"sum":0},"subop_push":0,"subop_push_in_bytes":0,"subop_push_latency":{"avgcount":0,"sum":0},"pull":0,"push":0,"push_out_bytes":0,"push_in":0,"push_in_bytes":0,"recovery_ops":0,"loadavg":7.36,"buffer_bytes":0,"numpg":501,"numpg_primary":214,"numpg_replica":287,"numpg_stray":0,"heartbeat_to_peers":3,"heartbeat_from_peers":0,"map_messages":10,"map_message_epochs":34,"map_message_epoch_dups":28},"throttle-filestore_bytes":{"val":0,"max":104857600,"get":0,"get_sum":0,"get_or_fail_fail":0,"get_or_fail_success":0,"take":34653,"take_sum":86821481160,"put":19377,"put_sum":86821481160,"wait":{"avgcount":0,"sum":0}},"throttle-filestore_ops":{"val":0,"max":500,"get":0
 ,"
>  !
> >
> > get!
> >>>
> >>>
> >>>
> >>> _sum":0,"
> >>>
> >>> get_or_fail_fail":0,"get_or_fail_success":0,"take":34653,"take_sum":34653,"put":19377,"put_sum":34653,"wait":{"avgcount":0,"sum":0}},"throttle-msgr_dispatch_throttler-client":{"val":0,"max":104857600,"get":13798,"get_sum":36132522860,"get_or_fail_fail":0,"get_or_fail_success":0,"take":0,"take_sum":0,"put":13798,"put_sum":36132522860,"wait":{"avgcount":0,"sum":0}},"throttle-msgr_dispatch_throttler-cluster":{"val":0,"max":104857600,"get":43047,"get_sum":50644298035,"get_or_fail_fail":0,"get_or_fail_success":0,"take":0,"take_sum":0,"put":43047,"put_sum":50644298035,"wait":{"avgcount":0,"sum":0}},"throttle-msgr_dispatch_throttler-hbclient":{"val":0,"max":104857600,"get":924,"get_sum":43428,"get_or_fail_fail":0,"get_or_fail_success":0,"take":0,"take_sum":0,"put":924,"put_sum":43428,"wait":{"avgcount":0,"sum":0}},"throttle-msgr_dispatch_throttler-hbserver":{"val":0,"max":104857600,"get":924,"get_sum":43428,"get_or_fail_fail":0,"get_or_fail_success":0,"take":0,"take_sum":0,"pu
 t"
>  !
> >
> > :92!
> >>>
> >>>
> >>>
> >>> 4,"put_su
> >>>
> >>> m":43428,"wait":{"avgcount":0,"sum":0}},"throttle-osd_client_bytes":{"val":0,"max":524288000,"get":13676,"get_sum":36132351923,"get_or_fail_fail":0,"get_or_fail_success":0,"take":0,"take_sum":0,"put":22290,"put_sum":36132351923,"wait":{"avgcount":0,"sum":0}}}
> >>>
> >>>
> >>>
> >>> ceph.conf
> >>>
> >>> [global]
> >>> pid file = /var/run/$name.pid
> >>> debug ms = 0
> >>> [mon]
> >>> mon data = /data/mon.$id
> >>> mon lease wiggle room = 1.0
> >>> debug mon = 1
> >>> [mon.a]
> >>> host = 01-00-25-90-79-cc-ba
> >>> mon addr = 10.100.42.111:6789
> >>> [osd]
> >>> osd data = /data/osd.$id
> >>> debug osd = 1
> >>> debug filestore = 1
> >>> filestore max sync interval = 30
> >>> filestore btrfs snap = true
> >>> journal aio = true
> >>> [osd.0]
> >>> host = 01-00-25-90-79-cc-ba
> >>> osd journal = /dev/sdc1
> >>> btrfs devs = /dev/sdg
> >>> btrfs options = noatime,nodatacow,autodefrag
> >>> [osd.1]
> >>> host = 01-00-25-90-79-cc-ba
> >>> osd journal = /dev/sdd1
> >>> btrfs devs = /dev/sdh
> >>> btrfs options = noatime,nodatacow,autodefrag
> >>> [osd.2]
> >>> host = 01-00-25-90-79-cc-ba
> >>> osd journal = /dev/sde1
> >>> btrfs devs = /dev/sdi
> >>> btrfs options = noatime,nodatacow,autodefrag
> >>> [osd.3]
> >>> host = 01-00-25-90-79-cc-ba
> >>> osd journal = /dev/sdf1
> >>> btrfs devs = /dev/sdj
> >>> btrfs options = noatime,nodatacow,autodefrag
> >>> --
> >>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>> the body of a message to majordomo@xxxxxxxxxxxxxxx
> >>> More majordomo info at http://vger.kernel.org/majordomo-info.html
> >>
> >>
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux