Some suggestions:
monitor raw resources such as cpu %util raw disk %util/busy, raw
disk iops.
instead of running a mix of workloads at this stage, narrow it
down first, for example using rbd rand writes and 4k block sizes,
then change 1 param at a time for example change the block size.
See how your cluster performs and what resources loads you get
step by step. Latency from 4M will not be the same as 4k.
i would also run fio tests on the raw Nytro 1551 devices
including sync writes.
I would not recommend you increase readahead for random io.
I do not recommend making RAID0
/Maged
On 01/10/2019 02:12, Sasha Litvak
wrote:
At this point, I ran out of ideas. I changed
nr_requests and readahead parameters to 128->1024 and
128->4096, tuned nodes to performance-throughput. However, I
still get high latency during benchmark testing. I attempted to
disable cache on ssd
for i in {a..f}; do hdparm -W 0 -A 0 /dev/sd$i; done
and I think it make things not better at all. I have H740
and H730 controllers with drives in HBA mode.
Other them converting them one by one to RAID0 I am not
sure what else I can try.
Any suggestions?
BTW:
commit and apply latency are the exact same thing since
BlueStore, so don't bother looking at both.
In fact you should mostly be looking at the op_*_latency
counters
Paul
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact us at https://croit.io
croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90
On Mon, Sep 30, 2019 at 8:46 PM Sasha Litvak
<alexander.v.litvak@xxxxxxxxx>
wrote:
>
> In my case, I am using premade Prometheus sourced
dashboards in grafana.
>
> For individual latency, the query looks like that
>
>
irate(ceph_osd_op_r_latency_sum{ceph_daemon=~"$osd"}[1m]) / on
(ceph_daemon) irate(ceph_osd_op_r_latency_count[1m])
> irate(ceph_osd_op_w_latency_sum{ceph_daemon=~"$osd"}[1m])
/ on (ceph_daemon) irate(ceph_osd_op_w_latency_count[1m])
>
> The other ones use
>
> ceph_osd_commit_latency_ms
> ceph_osd_apply_latency_ms
>
> and graph the distribution of it over time
>
> Also, average OSD op latency
>
>
avg(rate(ceph_osd_op_r_latency_sum{cluster="$cluster"}[5m]) /
rate(ceph_osd_op_r_latency_count{cluster="$cluster"}[5m])
>= 0)
>
avg(rate(ceph_osd_op_w_latency_sum{cluster="$cluster"}[5m]) /
rate(ceph_osd_op_w_latency_count{cluster="$cluster"}[5m])
>= 0)
>
> Average OSD apply + commit latency
> avg(ceph_osd_apply_latency_ms{cluster="$cluster"})
> avg(ceph_osd_commit_latency_ms{cluster="$cluster"})
>
>
> On Mon, Sep 30, 2019 at 11:13 AM Marc Roos <M.Roos@xxxxxxxxxxxxxxxxx>
wrote:
>>
>>
>> What parameters are you exactly using? I want to do a
similar test on
>> luminous, before I upgrade to Nautilus. I have quite
a lot (74+)
>>
>> type_instance=Osd.opBeforeDequeueOpLat
>> type_instance=Osd.opBeforeQueueOpLat
>> type_instance=Osd.opLatency
>> type_instance=Osd.opPrepareLatency
>> type_instance=Osd.opProcessLatency
>> type_instance=Osd.opRLatency
>> type_instance=Osd.opRPrepareLatency
>> type_instance=Osd.opRProcessLatency
>> type_instance=Osd.opRwLatency
>> type_instance=Osd.opRwPrepareLatency
>> type_instance=Osd.opRwProcessLatency
>> type_instance=Osd.opWLatency
>> type_instance=Osd.opWPrepareLatency
>> type_instance=Osd.opWProcessLatency
>> type_instance=Osd.subopLatency
>> type_instance=Osd.subopWLatency
>> ...
>> ...
>>
>>
>>
>>
>>
>> -----Original Message-----
>> From: Alex Litvak [mailto:alexander.v.litvak@xxxxxxxxx]
>> Sent: zondag 29 september 2019 13:06
>> To: ceph-users@xxxxxxxxxxxxxx
>> Cc: ceph-devel@xxxxxxxxxxxxxxx
>> Subject: Commit and Apply latency on
nautilus
>>
>> Hello everyone,
>>
>> I am running a number of parallel benchmark tests
against the cluster
>> that should be ready to go to production.
>> I enabled prometheus to monitor various information
and while cluster
>> stays healthy through the tests with no errors or
slow requests,
>> I noticed an apply / commit latency jumping between
40 - 600 ms on
>> multiple SSDs. At the same time op_read and op_write
are on average
>> below 0.25 ms in the worth case scenario.
>>
>> I am running nautilus 14.2.2, all bluestore, no
separate NVME devices
>> for WAL/DB, 6 SSDs per node(Dell PowerEdge R440) with
all drives Seagate
>> Nytro 1551, osd spread across 6 nodes, running in
>> containers. Each node has plenty of RAM with
utilization ~ 25 GB during
>> the benchmark runs.
>>
>> Here are benchmarks being run from 6 client systems
in parallel,
>> repeating the test for each block size in
<4k,16k,128k,4M>.
>>
>> On rbd mapped partition local to each client:
>>
>> fio --name=randrw --ioengine=libaio --iodepth=4
--rw=randrw
>> --bs=<4k,16k,128k,4M> --direct=1 --size=2G
--numjobs=8 --runtime=300
>> --group_reporting --time_based --rwmixread=70
>>
>> On mounted cephfs volume with each client storing
test file(s) in own
>> sub-directory:
>>
>> fio --name=randrw --ioengine=libaio --iodepth=4
--rw=randrw
>> --bs=<4k,16k,128k,4M> --direct=1 --size=2G
--numjobs=8 --runtime=300
>> --group_reporting --time_based --rwmixread=70
>>
>> dbench -t 30 30
>>
>> Could you please let me know if huge jump in applied
and committed
>> latency is justified in my case and whether I can do
anything to improve
>> / fix it. Below is some additional cluster info.
>>
>> Thank you,
>>
>> root@storage2n2-la:~# podman exec -it
ceph-mon-storage2n2-la ceph osd df
>> ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA
OMAP META AVAIL
>> %USE VAR PGS STATUS
>> 6 ssd 1.74609 1.00000 1.7 TiB 93 GiB 92 GiB
240 MiB 784 MiB 1.7
>> TiB 5.21 0.90 44 up
>> 12 ssd 1.74609 1.00000 1.7 TiB 98 GiB 97 GiB 118
MiB 906 MiB 1.7
>> TiB 5.47 0.95 40 up
>> 18 ssd 1.74609 1.00000 1.7 TiB 102 GiB 101 GiB 123
MiB 901 MiB 1.6
>> TiB 5.73 0.99 47 up
>> 24 ssd 3.49219 1.00000 3.5 TiB 222 GiB 221 GiB 134
MiB 890 MiB 3.3
>> TiB 6.20 1.07 96 up
>> 30 ssd 3.49219 1.00000 3.5 TiB 213 GiB 212 GiB 151
MiB 873 MiB 3.3
>> TiB 5.95 1.03 93 up
>> 35 ssd 3.49219 1.00000 3.5 TiB 203 GiB 202 GiB 301
MiB 723 MiB 3.3
>> TiB 5.67 0.98 100 up
>> 5 ssd 1.74609 1.00000 1.7 TiB 103 GiB 102 GiB
123 MiB 901 MiB 1.6
>> TiB 5.78 1.00 49 up
>> 11 ssd 1.74609 1.00000 1.7 TiB 109 GiB 108 GiB 63
MiB 961 MiB 1.6
>> TiB 6.09 1.05 46 up
>> 17 ssd 1.74609 1.00000 1.7 TiB 104 GiB 103 GiB 205
MiB 819 MiB 1.6
>> TiB 5.81 1.01 50 up
>> 23 ssd 3.49219 1.00000 3.5 TiB 210 GiB 209 GiB 168
MiB 856 MiB 3.3
>> TiB 5.86 1.01 86 up
>> 29 ssd 3.49219 1.00000 3.5 TiB 204 GiB 203 GiB 272
MiB 752 MiB 3.3
>> TiB 5.69 0.98 92 up
>> 34 ssd 3.49219 1.00000 3.5 TiB 198 GiB 197 GiB 295
MiB 729 MiB 3.3
>> TiB 5.54 0.96 85 up
>> 4 ssd 1.74609 1.00000 1.7 TiB 119 GiB 118 GiB
16 KiB 1024 MiB 1.6
>> TiB 6.67 1.15 50 up
>> 10 ssd 1.74609 1.00000 1.7 TiB 95 GiB 94 GiB 183
MiB 841 MiB 1.7
>> TiB 5.31 0.92 46 up
>> 16 ssd 1.74609 1.00000 1.7 TiB 102 GiB 101 GiB 122
MiB 902 MiB 1.6
>> TiB 5.72 0.99 50 up
>> 22 ssd 3.49219 1.00000 3.5 TiB 218 GiB 217 GiB 109
MiB 915 MiB 3.3
>> TiB 6.11 1.06 91 up
>> 28 ssd 3.49219 1.00000 3.5 TiB 198 GiB 197 GiB 343
MiB 681 MiB 3.3
>> TiB 5.54 0.96 95 up
>> 33 ssd 3.49219 1.00000 3.5 TiB 198 GiB 196 GiB 297
MiB 1019 MiB 3.3
>> TiB 5.53 0.96 85 up
>> 1 ssd 1.74609 1.00000 1.7 TiB 101 GiB 100 GiB
222 MiB 802 MiB 1.6
>> TiB 5.63 0.97 49 up
>> 7 ssd 1.74609 1.00000 1.7 TiB 102 GiB 101 GiB
153 MiB 871 MiB 1.6
>> TiB 5.69 0.99 46 up
>> 13 ssd 1.74609 1.00000 1.7 TiB 106 GiB 105 GiB 67
MiB 957 MiB 1.6
>> TiB 5.96 1.03 42 up
>> 19 ssd 3.49219 1.00000 3.5 TiB 206 GiB 205 GiB 179
MiB 845 MiB 3.3
>> TiB 5.77 1.00 83 up
>> 25 ssd 3.49219 1.00000 3.5 TiB 195 GiB 194 GiB 352
MiB 672 MiB 3.3
>> TiB 5.45 0.94 97 up
>> 31 ssd 3.49219 1.00000 3.5 TiB 201 GiB 200 GiB 305
MiB 719 MiB 3.3
>> TiB 5.62 0.97 90 up
>> 0 ssd 1.74609 1.00000 1.7 TiB 110 GiB 109 GiB
29 MiB 995 MiB 1.6
>> TiB 6.14 1.06 43 up
>> 3 ssd 1.74609 1.00000 1.7 TiB 109 GiB 108 GiB
28 MiB 996 MiB 1.6
>> TiB 6.07 1.05 41 up
>> 9 ssd 1.74609 1.00000 1.7 TiB 103 GiB 102 GiB
149 MiB 875 MiB 1.6
>> TiB 5.76 1.00 52 up
>> 15 ssd 3.49219 1.00000 3.5 TiB 209 GiB 208 GiB 253
MiB 771 MiB 3.3
>> TiB 5.83 1.01 98 up
>> 21 ssd 3.49219 1.00000 3.5 TiB 199 GiB 198 GiB 302
MiB 722 MiB 3.3
>> TiB 5.56 0.96 90 up
>> 27 ssd 3.49219 1.00000 3.5 TiB 208 GiB 207 GiB 226
MiB 798 MiB 3.3
>> TiB 5.81 1.00 95 up
>> 2 ssd 1.74609 1.00000 1.7 TiB 96 GiB 95 GiB
158 MiB 866 MiB 1.7
>> TiB 5.35 0.93 45 up
>> 8 ssd 1.74609 1.00000 1.7 TiB 106 GiB 105 GiB
132 MiB 892 MiB 1.6
>> TiB 5.91 1.02 50 up
>> 14 ssd 1.74609 1.00000 1.7 TiB 96 GiB 95 GiB 180
MiB 844 MiB 1.7
>> TiB 5.35 0.92 46 up
>> 20 ssd 3.49219 1.00000 3.5 TiB 221 GiB 220 GiB 156
MiB 868 MiB 3.3
>> TiB 6.18 1.07 101 up
>> 26 ssd 3.49219 1.00000 3.5 TiB 206 GiB 205 GiB 332
MiB 692 MiB 3.3
>> TiB 5.76 1.00 92 up
>> 32 ssd 3.49219 1.00000 3.5 TiB 221 GiB 220 GiB 88
MiB 936 MiB 3.3
>> TiB 6.18 1.07 91 up
>> TOTAL 94 TiB 5.5 TiB 5.4 TiB
6.4 GiB 30 GiB 89
>> TiB 5.78
>> MIN/MAX VAR: 0.90/1.15 STDDEV: 0.30
>>
>>
>> root@storage2n2-la:~# podman exec -it
ceph-mon-storage2n2-la ceph -s
>> cluster:
>> id: 9b4468b7-5bf2-4964-8aec-4b2f4bee87ad
>> health: HEALTH_OK
>>
>> services:
>> mon: 3 daemons, quorum
storage2n1-la,storage2n2-la,storage2n3-la
>> (age 9w)
>> mgr: storage2n2-la(active, since 9w), standbys:
storage2n1-la,
>> storage2n3-la
>> mds: cephfs:1 {0=storage2n6-la=up:active} 1
up:standby-replay 1
>> up:standby
>> osd: 36 osds: 36 up (since 9w), 36 in (since 9w)
>>
>> data:
>> pools: 3 pools, 832 pgs
>> objects: 4.18M objects, 1.8 TiB
>> usage: 5.5 TiB used, 89 TiB / 94 TiB avail
>> pgs: 832 active+clean
>>
>> io:
>> client: 852 B/s rd, 15 KiB/s wr, 4 op/s rd, 2
op/s wr
>>
>>
>>
>>
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
|