Re: Commit and Apply latency on nautilus

Sasha Litvak <alexander.v.litvak@xxxxxxxxx> · Mon, 30 Sep 2019 19:12:05 -0500

At this point, I ran out of ideas.  I changed nr_requests and readahead parameters to 128->1024 and 128->4096, tuned nodes to performance-throughput.  However, I still get high latency during benchmark testing.  I attempted to disable cache on ssd 
for i in {a..f}; do hdparm -W 0 -A 0 /dev/sd$i; done 

and I think it make things not better at all.  I have H740 and H730 controllers with drives in HBA mode.  

Other them converting them one by one to RAID0 I am not sure what else I can try. 

Any suggestions?

On Mon, Sep 30, 2019 at 2:45 PM Paul Emmerich <paul.emmerich@xxxxxxxx> wrote:
BTW: commit and apply latency are the exact same thing since

BlueStore, so don't bother looking at both.

In fact you should mostly be looking at the op_*_latency counters

Paul

-- 

Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH

Freseniusstr. 31h

81247 München

www.croit.io

Tel: +49 89 1896585 90

On Mon, Sep 30, 2019 at 8:46 PM Sasha Litvak

<alexander.v.litvak@xxxxxxxxx> wrote:

>

> In my case, I am using premade Prometheus sourced dashboards in grafana.

>

> For individual latency, the query looks like that

>

>  irate(ceph_osd_op_r_latency_sum{ceph_daemon=~"$osd"}[1m]) / on (ceph_daemon) irate(ceph_osd_op_r_latency_count[1m])

> irate(ceph_osd_op_w_latency_sum{ceph_daemon=~"$osd"}[1m]) / on (ceph_daemon) irate(ceph_osd_op_w_latency_count[1m])

>

> The other ones use

>

> ceph_osd_commit_latency_ms

> ceph_osd_apply_latency_ms

>

> and graph the distribution of it over time

>

> Also, average OSD op latency

>

> avg(rate(ceph_osd_op_r_latency_sum{cluster="$cluster"}[5m]) / rate(ceph_osd_op_r_latency_count{cluster="$cluster"}[5m]) >= 0)

> avg(rate(ceph_osd_op_w_latency_sum{cluster="$cluster"}[5m]) / rate(ceph_osd_op_w_latency_count{cluster="$cluster"}[5m]) >= 0)

>

> Average OSD apply + commit latency

> avg(ceph_osd_apply_latency_ms{cluster="$cluster"})

> avg(ceph_osd_commit_latency_ms{cluster="$cluster"})

>

>

> On Mon, Sep 30, 2019 at 11:13 AM Marc Roos <M.Roos@xxxxxxxxxxxxxxxxx> wrote:

>>

>>

>> What parameters are you exactly using? I want to do a similar test on

>> luminous, before I upgrade to Nautilus. I have quite a lot (74+)

>>

>> type_instance=Osd.opBeforeDequeueOpLat

>> type_instance=Osd.opBeforeQueueOpLat

>> type_instance=Osd.opLatency

>> type_instance=Osd.opPrepareLatency

>> type_instance=Osd.opProcessLatency

>> type_instance=Osd.opRLatency

>> type_instance=Osd.opRPrepareLatency

>> type_instance=Osd.opRProcessLatency

>> type_instance=Osd.opRwLatency

>> type_instance=Osd.opRwPrepareLatency

>> type_instance=Osd.opRwProcessLatency

>> type_instance=Osd.opWLatency

>> type_instance=Osd.opWPrepareLatency

>> type_instance=Osd.opWProcessLatency

>> type_instance=Osd.subopLatency

>> type_instance=Osd.subopWLatency

>> ...

>> ...

>>

>>

>>

>>

>>

>> -----Original Message-----

>> From: Alex Litvak [mailto:alexander.v.litvak@xxxxxxxxx]

>> Sent: zondag 29 september 2019 13:06

>> To: ceph-users@xxxxxxxxxxxxxx

>> Cc: ceph-devel@xxxxxxxxxxxxxxx

>> Subject:  Commit and Apply latency on nautilus

>>

>> Hello everyone,

>>

>> I am running a number of parallel benchmark tests against the cluster

>> that should be ready to go to production.

>> I enabled prometheus to monitor various information and while cluster

>> stays healthy through the tests with no errors or slow requests,

>> I noticed an apply / commit latency jumping between 40 - 600 ms on

>> multiple SSDs.  At the same time op_read and op_write are on average

>> below 0.25 ms in the worth case scenario.

>>

>> I am running nautilus 14.2.2, all bluestore, no separate NVME devices

>> for WAL/DB, 6 SSDs per node(Dell PowerEdge R440) with all drives Seagate

>> Nytro 1551, osd spread across 6 nodes, running in

>> containers.  Each node has plenty of RAM with utilization ~ 25 GB during

>> the benchmark runs.

>>

>> Here are benchmarks being run from 6 client systems in parallel,

>> repeating the test for each block size in <4k,16k,128k,4M>.

>>

>> On rbd mapped partition local to each client:

>>

>> fio --name=randrw --ioengine=libaio --iodepth=4 --rw=randrw

>> --bs=<4k,16k,128k,4M> --direct=1 --size=2G --numjobs=8 --runtime=300

>> --group_reporting --time_based --rwmixread=70

>>

>> On mounted cephfs volume with each client storing test file(s) in own

>> sub-directory:

>>

>> fio --name=randrw --ioengine=libaio --iodepth=4 --rw=randrw

>> --bs=<4k,16k,128k,4M> --direct=1 --size=2G --numjobs=8 --runtime=300

>> --group_reporting --time_based --rwmixread=70

>>

>> dbench -t 30 30

>>

>> Could you please let me know if huge jump in applied and committed

>> latency is justified in my case and whether I can do anything to improve

>> / fix it.  Below is some additional cluster info.

>>

>> Thank you,

>>

>> root@storage2n2-la:~# podman exec -it ceph-mon-storage2n2-la ceph osd df

>> ID CLASS WEIGHT  REWEIGHT SIZE    RAW USE DATA    OMAP    META     AVAIL

>>   %USE VAR  PGS STATUS

>>   6   ssd 1.74609  1.00000 1.7 TiB  93 GiB  92 GiB 240 MiB  784 MiB 1.7

>> TiB 5.21 0.90  44     up

>> 12   ssd 1.74609  1.00000 1.7 TiB  98 GiB  97 GiB 118 MiB  906 MiB 1.7

>> TiB 5.47 0.95  40     up

>> 18   ssd 1.74609  1.00000 1.7 TiB 102 GiB 101 GiB 123 MiB  901 MiB 1.6

>> TiB 5.73 0.99  47     up

>> 24   ssd 3.49219  1.00000 3.5 TiB 222 GiB 221 GiB 134 MiB  890 MiB 3.3

>> TiB 6.20 1.07  96     up

>> 30   ssd 3.49219  1.00000 3.5 TiB 213 GiB 212 GiB 151 MiB  873 MiB 3.3

>> TiB 5.95 1.03  93     up

>> 35   ssd 3.49219  1.00000 3.5 TiB 203 GiB 202 GiB 301 MiB  723 MiB 3.3

>> TiB 5.67 0.98 100     up

>>   5   ssd 1.74609  1.00000 1.7 TiB 103 GiB 102 GiB 123 MiB  901 MiB 1.6

>> TiB 5.78 1.00  49     up

>> 11   ssd 1.74609  1.00000 1.7 TiB 109 GiB 108 GiB  63 MiB  961 MiB 1.6

>> TiB 6.09 1.05  46     up

>> 17   ssd 1.74609  1.00000 1.7 TiB 104 GiB 103 GiB 205 MiB  819 MiB 1.6

>> TiB 5.81 1.01  50     up

>> 23   ssd 3.49219  1.00000 3.5 TiB 210 GiB 209 GiB 168 MiB  856 MiB 3.3

>> TiB 5.86 1.01  86     up

>> 29   ssd 3.49219  1.00000 3.5 TiB 204 GiB 203 GiB 272 MiB  752 MiB 3.3

>> TiB 5.69 0.98  92     up

>> 34   ssd 3.49219  1.00000 3.5 TiB 198 GiB 197 GiB 295 MiB  729 MiB 3.3

>> TiB 5.54 0.96  85     up

>>   4   ssd 1.74609  1.00000 1.7 TiB 119 GiB 118 GiB  16 KiB 1024 MiB 1.6

>> TiB 6.67 1.15  50     up

>> 10   ssd 1.74609  1.00000 1.7 TiB  95 GiB  94 GiB 183 MiB  841 MiB 1.7

>> TiB 5.31 0.92  46     up

>> 16   ssd 1.74609  1.00000 1.7 TiB 102 GiB 101 GiB 122 MiB  902 MiB 1.6

>> TiB 5.72 0.99  50     up

>> 22   ssd 3.49219  1.00000 3.5 TiB 218 GiB 217 GiB 109 MiB  915 MiB 3.3

>> TiB 6.11 1.06  91     up

>> 28   ssd 3.49219  1.00000 3.5 TiB 198 GiB 197 GiB 343 MiB  681 MiB 3.3

>> TiB 5.54 0.96  95     up

>> 33   ssd 3.49219  1.00000 3.5 TiB 198 GiB 196 GiB 297 MiB 1019 MiB 3.3

>> TiB 5.53 0.96  85     up

>>   1   ssd 1.74609  1.00000 1.7 TiB 101 GiB 100 GiB 222 MiB  802 MiB 1.6

>> TiB 5.63 0.97  49     up

>>   7   ssd 1.74609  1.00000 1.7 TiB 102 GiB 101 GiB 153 MiB  871 MiB 1.6

>> TiB 5.69 0.99  46     up

>> 13   ssd 1.74609  1.00000 1.7 TiB 106 GiB 105 GiB  67 MiB  957 MiB 1.6

>> TiB 5.96 1.03  42     up

>> 19   ssd 3.49219  1.00000 3.5 TiB 206 GiB 205 GiB 179 MiB  845 MiB 3.3

>> TiB 5.77 1.00  83     up

>> 25   ssd 3.49219  1.00000 3.5 TiB 195 GiB 194 GiB 352 MiB  672 MiB 3.3

>> TiB 5.45 0.94  97     up

>> 31   ssd 3.49219  1.00000 3.5 TiB 201 GiB 200 GiB 305 MiB  719 MiB 3.3

>> TiB 5.62 0.97  90     up

>>   0   ssd 1.74609  1.00000 1.7 TiB 110 GiB 109 GiB  29 MiB  995 MiB 1.6

>> TiB 6.14 1.06  43     up

>>   3   ssd 1.74609  1.00000 1.7 TiB 109 GiB 108 GiB  28 MiB  996 MiB 1.6

>> TiB 6.07 1.05  41     up

>>   9   ssd 1.74609  1.00000 1.7 TiB 103 GiB 102 GiB 149 MiB  875 MiB 1.6

>> TiB 5.76 1.00  52     up

>> 15   ssd 3.49219  1.00000 3.5 TiB 209 GiB 208 GiB 253 MiB  771 MiB 3.3

>> TiB 5.83 1.01  98     up

>> 21   ssd 3.49219  1.00000 3.5 TiB 199 GiB 198 GiB 302 MiB  722 MiB 3.3

>> TiB 5.56 0.96  90     up

>> 27   ssd 3.49219  1.00000 3.5 TiB 208 GiB 207 GiB 226 MiB  798 MiB 3.3

>> TiB 5.81 1.00  95     up

>>   2   ssd 1.74609  1.00000 1.7 TiB  96 GiB  95 GiB 158 MiB  866 MiB 1.7

>> TiB 5.35 0.93  45     up

>>   8   ssd 1.74609  1.00000 1.7 TiB 106 GiB 105 GiB 132 MiB  892 MiB 1.6

>> TiB 5.91 1.02  50     up

>> 14   ssd 1.74609  1.00000 1.7 TiB  96 GiB  95 GiB 180 MiB  844 MiB 1.7

>> TiB 5.35 0.92  46     up

>> 20   ssd 3.49219  1.00000 3.5 TiB 221 GiB 220 GiB 156 MiB  868 MiB 3.3

>> TiB 6.18 1.07 101     up

>> 26   ssd 3.49219  1.00000 3.5 TiB 206 GiB 205 GiB 332 MiB  692 MiB 3.3

>> TiB 5.76 1.00  92     up

>> 32   ssd 3.49219  1.00000 3.5 TiB 221 GiB 220 GiB  88 MiB  936 MiB 3.3

>> TiB 6.18 1.07  91     up

>>                      TOTAL  94 TiB 5.5 TiB 5.4 TiB 6.4 GiB   30 GiB  89

>> TiB 5.78

>> MIN/MAX VAR: 0.90/1.15  STDDEV: 0.30

>>

>>

>> root@storage2n2-la:~# podman exec -it ceph-mon-storage2n2-la ceph -s

>>    cluster:

>>      id:     9b4468b7-5bf2-4964-8aec-4b2f4bee87ad

>>      health: HEALTH_OK

>>

>>    services:

>>      mon: 3 daemons, quorum storage2n1-la,storage2n2-la,storage2n3-la

>> (age 9w)

>>      mgr: storage2n2-la(active, since 9w), standbys: storage2n1-la,

>> storage2n3-la

>>      mds: cephfs:1 {0=storage2n6-la=up:active} 1 up:standby-replay 1

>> up:standby

>>      osd: 36 osds: 36 up (since 9w), 36 in (since 9w)

>>

>>    data:

>>      pools:   3 pools, 832 pgs

>>      objects: 4.18M objects, 1.8 TiB

>>      usage:   5.5 TiB used, 89 TiB / 94 TiB avail

>>      pgs:     832 active+clean

>>

>>    io:

>>      client:   852 B/s rd, 15 KiB/s wr, 4 op/s rd, 2 op/s wr

>>

>>

>>

>>

>>

>> _______________________________________________

>> ceph-users mailing list

>> ceph-users@xxxxxxxxxxxxxx

>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>>

>>

> _______________________________________________

> ceph-users mailing list

> ceph-users@xxxxxxxxxxxxxx

> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com