Re: ceph osd commit latency increase over time, until restart

Alexandre DERUMIER <aderumier@xxxxxxxxx> · Wed, 30 Jan 2019 19:58:15 +0100 (CET)

>>Thanks. Is there any reason you monitor op_w_latency but not 
>>op_r_latency but instead op_latency? 
>>
>>Also why do you monitor op_w_process_latency? but not op_r_process_latency? 

I monitor read too. (I have all metrics for osd sockets, and a lot of graphs).

I just don't see latency difference on reads. (or they are very very small  vs the write latency increase)

----- Mail original -----
De: "Stefan Priebe, Profihost AG" <s.priebe@xxxxxxxxxxxx>
À: "aderumier" <aderumier@xxxxxxxxx>
Cc: "Sage Weil" <sage@xxxxxxxxxxxx>, "ceph-users" <ceph-users@xxxxxxxxxxxxxx>, "ceph-devel" <ceph-devel@xxxxxxxxxxxxxxx>
Envoyé: Mercredi 30 Janvier 2019 19:50:20
Objet: Re:  ceph osd commit latency increase over time, until restart

Hi, 

Am 30.01.19 um 14:59 schrieb Alexandre DERUMIER: 
> Hi Stefan, 
> 
>>> currently i'm in the process of switching back from jemalloc to tcmalloc 
>>> like suggested. This report makes me a little nervous about my change. 
> Well,I'm really not sure that it's a tcmalloc bug. 
> maybe bluestore related (don't have filestore anymore to compare) 
> I need to compare with bigger latencies 
> 
> here an example, when all osd at 20-50ms before restart, then after restart (at 21:15), 1ms 
> http://odisoweb1.odiso.net/latencybad.png 
> 
> I observe the latency in my guest vm too, on disks iowait. 
> 
> http://odisoweb1.odiso.net/latencybadvm.png 
> 
>>> Also i'm currently only monitoring latency for filestore osds. Which 
>>> exact values out of the daemon do you use for bluestore? 
> 
> here my influxdb queries: 
> 
> It take op_latency.sum/op_latency.avgcount on last second. 
> 
> 
> SELECT non_negative_derivative(first("op_latency.sum"), 1s)/non_negative_derivative(first("op_latency.avgcount"),1s) FROM "ceph" WHERE "host" =~ /^([[host]])$/ AND "id" =~ /^([[osd]])$/ AND $timeFilter GROUP BY time($interval), "host", "id" fill(previous) 
> 
> 
> SELECT non_negative_derivative(first("op_w_latency.sum"), 1s)/non_negative_derivative(first("op_w_latency.avgcount"),1s) FROM "ceph" WHERE "host" =~ /^([[host]])$/ AND collection='osd' AND "id" =~ /^([[osd]])$/ AND $timeFilter GROUP BY time($interval), "host", "id" fill(previous) 
> 
> 
> SELECT non_negative_derivative(first("op_w_process_latency.sum"), 1s)/non_negative_derivative(first("op_w_process_latency.avgcount"),1s) FROM "ceph" WHERE "host" =~ /^([[host]])$/ AND collection='osd' AND "id" =~ /^([[osd]])$/ AND $timeFilter GROUP BY time($interval), "host", "id" fill(previous) 

Thanks. Is there any reason you monitor op_w_latency but not 
op_r_latency but instead op_latency? 

Also why do you monitor op_w_process_latency? but not op_r_process_latency? 

greets, 
Stefan 

> 
> 
> 
> 
> 
> ----- Mail original ----- 
> De: "Stefan Priebe, Profihost AG" <s.priebe@xxxxxxxxxxxx> 
> À: "aderumier" <aderumier@xxxxxxxxx>, "Sage Weil" <sage@xxxxxxxxxxxx> 
> Cc: "ceph-users" <ceph-users@xxxxxxxxxxxxxx>, "ceph-devel" <ceph-devel@xxxxxxxxxxxxxxx> 
> Envoyé: Mercredi 30 Janvier 2019 08:45:33 
> Objet: Re:  ceph osd commit latency increase over time, until restart 
> 
> Hi, 
> 
> Am 30.01.19 um 08:33 schrieb Alexandre DERUMIER: 
>> Hi, 
>> 
>> here some new results, 
>> different osd/ different cluster 
>> 
>> before osd restart latency was between 2-5ms 
>> after osd restart is around 1-1.5ms 
>> 
>> http://odisoweb1.odiso.net/cephperf2/bad.txt (2-5ms) 
>> http://odisoweb1.odiso.net/cephperf2/ok.txt (1-1.5ms) 
>> http://odisoweb1.odiso.net/cephperf2/diff.txt 
>> 
>> From what I see in diff, the biggest difference is in tcmalloc, but maybe I'm wrong. 
>> (I'm using tcmalloc 2.5-2.2) 
> 
> currently i'm in the process of switching back from jemalloc to tcmalloc 
> like suggested. This report makes me a little nervous about my change. 
> 
> Also i'm currently only monitoring latency for filestore osds. Which 
> exact values out of the daemon do you use for bluestore? 
> 
> I would like to check if i see the same behaviour. 
> 
> Greets, 
> Stefan 
> 
>> 
>> ----- Mail original ----- 
>> De: "Sage Weil" <sage@xxxxxxxxxxxx> 
>> À: "aderumier" <aderumier@xxxxxxxxx> 
>> Cc: "ceph-users" <ceph-users@xxxxxxxxxxxxxx>, "ceph-devel" <ceph-devel@xxxxxxxxxxxxxxx> 
>> Envoyé: Vendredi 25 Janvier 2019 10:49:02 
>> Objet: Re: ceph osd commit latency increase over time, until restart 
>> 
>> Can you capture a perf top or perf record to see where teh CPU time is 
>> going on one of the OSDs wth a high latency? 
>> 
>> Thanks! 
>> sage 
>> 
>> 
>> On Fri, 25 Jan 2019, Alexandre DERUMIER wrote: 
>> 
>>> 
>>> Hi, 
>>> 
>>> I have a strange behaviour of my osd, on multiple clusters, 
>>> 
>>> All cluster are running mimic 13.2.1,bluestore, with ssd or nvme drivers, 
>>> workload is rbd only, with qemu-kvm vms running with librbd + snapshot/rbd export-diff/snapshotdelete each day for backup 
>>> 
>>> When the osd are refreshly started, the commit latency is between 0,5-1ms. 
>>> 
>>> But overtime, this latency increase slowly (maybe around 1ms by day), until reaching crazy 
>>> values like 20-200ms. 
>>> 
>>> Some example graphs: 
>>> 
>>> http://odisoweb1.odiso.net/osdlatency1.png 
>>> http://odisoweb1.odiso.net/osdlatency2.png 
>>> 
>>> All osds have this behaviour, in all clusters. 
>>> 
>>> The latency of physical disks is ok. (Clusters are far to be full loaded) 
>>> 
>>> And if I restart the osd, the latency come back to 0,5-1ms. 
>>> 
>>> That's remember me old tcmalloc bug, but maybe could it be a bluestore memory bug ? 
>>> 
>>> Any Hints for counters/logs to check ? 
>>> 
>>> 
>>> Regards, 
>>> 
>>> Alexandre 
>>> 
>>> 
>> 
>> _______________________________________________ 
>> ceph-users mailing list 
>> ceph-users@xxxxxxxxxxxxxx 
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
>> 
> 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com