Re: how to find the lazy egg - poor performance - interesting observations [klartext]

Paul Emmerich <paul.emmerich@xxxxxxxx> · Wed, 13 Nov 2019 14:08:08 +0100



On Wed, Nov 13, 2019 at 10:13 AM Stefan Bauer <sb@xxxxxxx> wrote:
>
> Paul,
>
>
> i would like to take the chance, to thank you and ask if it could not be, that
> subop_latency reports high value (is that avgtime in seconds reported?)
> because the communication partner is slow in writing/commiting?

no


Paul

>
>
> Dont want to follow the red hering :/
>
>
> We have the following times on our 11 osds. Attached image.
>
>
>
> -----Ursprüngliche Nachricht-----
> Von: Paul Emmerich <paul.emmerich@xxxxxxxx>
> Gesendet: Donnerstag 7 November 2019 19:04
> An: Stefan Bauer <stefan.bauer@xxxxxxxxxxx>
> CC: ceph-users@xxxxxxxxxxxxxx
> Betreff: Re:  how to find the lazy egg - poor performance - interesting observations [klartext]
>
> You can have a look at subop_latency in "ceph daemon osd.XX perf
> dump", it tells you how long an OSD took to reply to another OSD.
> That's usually a good indicator if an OSD is dragging down others.
> Or have a look at "ceph osd perf dump" which is basically disk
> latency; simpler to acquire but with less information
>
> Paul
>
> --
> Paul Emmerich
>
> Looking for help with your Ceph cluster? Contact us at https://croit.io
>
> croit GmbH
> Freseniusstr. 31h
> 81247 München
> www.croit.io
> Tel: +49 89 1896585 90
>
> On Thu, Nov 7, 2019 at 6:55 PM Stefan Bauer <sb@xxxxxxx> wrote:
> >
> > Hi folks,
> >
> >
> > we are running a 3 node proxmox-cluster with - of corse - ceph :)
> >
> > ceph version 12.2.12 (39cfebf25a7011204a9876d2950e4b28aba66d11) luminous (stable)
> >
> >
> > 10G network. iperf reports almost 10G between all nodes.
> >
> >
> > We are using mixed standard SSDs (crucial / samsung). We are aware, that these disks can not delivery high iops or great throughput, but we have several of these clusters and this one is showing very poor performance.
> >
> >
> > NOW the strange fact:
> >
> >
> > When a specific node is rebooting, the throughput is acceptable.
> >
> >
> > But when the specific node is back, the results dropped by almost 100%.
> >
> >
> > 2 NODES (one rebooting)
> >
> >
> > # rados bench -p scbench 10 write --no-cleanup
> > hints = 1
> > Maintaining 16 concurrent writes of 4194304 bytes to objects of size 4194304 for up to 10 seconds or 0 objects
> > Object prefix: benchmark_data_pve3_1767693
> >   sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
> >     0       0         0         0         0         0           -           0
> >     1      16        55        39   155.992       156   0.0445665    0.257988
> >     2      16       110        94    187.98       220    0.087097    0.291173
> >     3      16       156       140   186.645       184    0.462171    0.286895
> >     4      16       184       168    167.98       112   0.0235336    0.358085
> >     5      16       210       194   155.181       104    0.112401    0.347883
> >     6      16       252       236   157.314       168    0.134099    0.382159
> >     7      16       287       271   154.838       140   0.0264864     0.40092
> >     8      16       329       313   156.481       168   0.0609964    0.394753
> >     9      16       364       348   154.649       140    0.244309    0.392331
> >    10      16       416       400   159.981       208    0.277489    0.387424
> > Total time run:         10.335496
> > Total writes made:      417
> > Write size:             4194304
> > Object size:            4194304
> > Bandwidth (MB/sec):     161.386
> > Stddev Bandwidth:       37.8065
> > Max bandwidth (MB/sec): 220
> > Min bandwidth (MB/sec): 104
> > Average IOPS:           40
> > Stddev IOPS:            9
> > Max IOPS:               55
> > Min IOPS:               26
> > Average Latency(s):     0.396434
> > Stddev Latency(s):      0.428527
> > Max latency(s):         1.86968
> > Min latency(s):         0.020558
> >
> >
> >
> > THIRD NODE ONLINE:
> >
> >
> >
> > root@pve3:/# rados bench -p scbench 10 write --no-cleanup
> > hints = 1
> > Maintaining 16 concurrent writes of 4194304 bytes to objects of size 4194304 for up to 10 seconds or 0 objects
> > Object prefix: benchmark_data_pve3_1771977
> >   sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
> >     0       0         0         0         0         0           -           0
> >     1      16        39        23   91.9943        92     0.21353    0.267249
> >     2      16        46        30   59.9924        28     0.29527    0.268672
> >     3      16        53        37   49.3271        28    0.122732    0.259731
> >     4      16        53        37   36.9954         0           -    0.259731
> >     5      16        53        37   29.5963         0           -    0.259731
> >     6      16        87        71   47.3271   45.3333    0.241921     1.19831
> >     7      16       106        90   51.4214        76    0.124821     1.07941
> >     8      16       129       113    56.492        92   0.0314146    0.941378
> >     9      16       142       126   55.9919        52    0.285536    0.871445
> >    10      16       147       131   52.3925        20    0.354803    0.852074
> > Total time run:         10.138312
> > Total writes made:      148
> > Write size:             4194304
> > Object size:            4194304
> > Bandwidth (MB/sec):     58.3924
> > Stddev Bandwidth:       34.405
> > Max bandwidth (MB/sec): 92
> > Min bandwidth (MB/sec): 0
> > Average IOPS:           14
> > Stddev IOPS:            8
> > Max IOPS:               23
> > Min IOPS:               0
> > Average Latency(s):     1.08818
> > Stddev Latency(s):      1.55967
> > Max latency(s):         5.02514
> > Min latency(s):         0.0255947
> >
> >
> >
> > Is here a single node faulty?
> >
> >
> >
> > root@pve3:/# ceph status
> >   cluster:
> >     id:     138c857a-c4e6-4600-9320-9567011470d6
> >     health: HEALTH_WARN
> >             application not enabled on 1 pool(s) (thats just for benchmarking)
> >
> >   services:
> >     mon: 3 daemons, quorum pve1,pve2,pve3
> >     mgr: pve1(active), standbys: pve3, pve2
> >     osd: 12 osds: 12 up, 12 in
> >
> >   data:
> >     pools:   2 pools, 612 pgs
> >     objects: 758.52k objects, 2.89TiB
> >     usage:   8.62TiB used, 7.75TiB / 16.4TiB avail
> >     pgs:     611 active+clean
> >              1   active+clean+scrubbing+deep
> >
> >   io:
> >     client:   4.99MiB/s rd, 1.36MiB/s wr, 678op/s rd, 105op/s wr
> >
> >
> >
> > Thank you.
> >
> >
> > Stefan
> >
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@xxxxxxxxxxxxxx
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com