Re: ceph osd commit latency increase over time, until restart

Alexandre DERUMIER <aderumier@xxxxxxxxx> · Wed, 30 Jan 2019 14:59:01 +0100 (CET)

Hi Stefan,

>>currently i'm in the process of switching back from jemalloc to tcmalloc 
>>like suggested. This report makes me a little nervous about my change. 
Well,I'm really not sure that it's a tcmalloc bug. 
maybe bluestore related (don't have filestore anymore to compare)
I need to compare with bigger latencies

here an example, when all osd at 20-50ms before restart, then after restart (at 21:15), 1ms
http://odisoweb1.odiso.net/latencybad.png

I observe the latency in my guest vm too, on disks iowait.

http://odisoweb1.odiso.net/latencybadvm.png

>>Also i'm currently only monitoring latency for filestore osds. Which
>>exact values out of the daemon do you use for bluestore?

here my influxdb queries:

It take op_latency.sum/op_latency.avgcount on last second.

SELECT non_negative_derivative(first("op_latency.sum"), 1s)/non_negative_derivative(first("op_latency.avgcount"),1s)   FROM "ceph" WHERE "host" =~  /^([[host]])$/  AND "id" =~ /^([[osd]])$/ AND $timeFilter GROUP BY time($interval), "host", "id" fill(previous)

SELECT non_negative_derivative(first("op_w_latency.sum"), 1s)/non_negative_derivative(first("op_w_latency.avgcount"),1s)   FROM "ceph" WHERE "host" =~ /^([[host]])$/  AND collection='osd'  AND  "id" =~ /^([[osd]])$/ AND $timeFilter GROUP BY time($interval), "host", "id" fill(previous)

SELECT non_negative_derivative(first("op_w_process_latency.sum"), 1s)/non_negative_derivative(first("op_w_process_latency.avgcount"),1s)   FROM "ceph" WHERE "host" =~ /^([[host]])$/  AND collection='osd'  AND  "id" =~ /^([[osd]])$/ AND $timeFilter GROUP BY time($interval), "host", "id" fill(previous)

----- Mail original -----
De: "Stefan Priebe, Profihost AG" <s.priebe@xxxxxxxxxxxx>
À: "aderumier" <aderumier@xxxxxxxxx>, "Sage Weil" <sage@xxxxxxxxxxxx>
Cc: "ceph-users" <ceph-users@xxxxxxxxxxxxxx>, "ceph-devel" <ceph-devel@xxxxxxxxxxxxxxx>
Envoyé: Mercredi 30 Janvier 2019 08:45:33
Objet: Re:  ceph osd commit latency increase over time, until restart

Hi, 

Am 30.01.19 um 08:33 schrieb Alexandre DERUMIER: 
> Hi, 
> 
> here some new results, 
> different osd/ different cluster 
> 
> before osd restart latency was between 2-5ms 
> after osd restart is around 1-1.5ms 
> 
> http://odisoweb1.odiso.net/cephperf2/bad.txt (2-5ms) 
> http://odisoweb1.odiso.net/cephperf2/ok.txt (1-1.5ms) 
> http://odisoweb1.odiso.net/cephperf2/diff.txt 
> 
> From what I see in diff, the biggest difference is in tcmalloc, but maybe I'm wrong. 
> (I'm using tcmalloc 2.5-2.2) 

currently i'm in the process of switching back from jemalloc to tcmalloc 
like suggested. This report makes me a little nervous about my change. 

Also i'm currently only monitoring latency for filestore osds. Which 
exact values out of the daemon do you use for bluestore? 

I would like to check if i see the same behaviour. 

Greets, 
Stefan 

> 
> ----- Mail original ----- 
> De: "Sage Weil" <sage@xxxxxxxxxxxx> 
> À: "aderumier" <aderumier@xxxxxxxxx> 
> Cc: "ceph-users" <ceph-users@xxxxxxxxxxxxxx>, "ceph-devel" <ceph-devel@xxxxxxxxxxxxxxx> 
> Envoyé: Vendredi 25 Janvier 2019 10:49:02 
> Objet: Re: ceph osd commit latency increase over time, until restart 
> 
> Can you capture a perf top or perf record to see where teh CPU time is 
> going on one of the OSDs wth a high latency? 
> 
> Thanks! 
> sage 
> 
> 
> On Fri, 25 Jan 2019, Alexandre DERUMIER wrote: 
> 
>> 
>> Hi, 
>> 
>> I have a strange behaviour of my osd, on multiple clusters, 
>> 
>> All cluster are running mimic 13.2.1,bluestore, with ssd or nvme drivers, 
>> workload is rbd only, with qemu-kvm vms running with librbd + snapshot/rbd export-diff/snapshotdelete each day for backup 
>> 
>> When the osd are refreshly started, the commit latency is between 0,5-1ms. 
>> 
>> But overtime, this latency increase slowly (maybe around 1ms by day), until reaching crazy 
>> values like 20-200ms. 
>> 
>> Some example graphs: 
>> 
>> http://odisoweb1.odiso.net/osdlatency1.png 
>> http://odisoweb1.odiso.net/osdlatency2.png 
>> 
>> All osds have this behaviour, in all clusters. 
>> 
>> The latency of physical disks is ok. (Clusters are far to be full loaded) 
>> 
>> And if I restart the osd, the latency come back to 0,5-1ms. 
>> 
>> That's remember me old tcmalloc bug, but maybe could it be a bluestore memory bug ? 
>> 
>> Any Hints for counters/logs to check ? 
>> 
>> 
>> Regards, 
>> 
>> Alexandre 
>> 
>> 
> 
> _______________________________________________ 
> ceph-users mailing list 
> ceph-users@xxxxxxxxxxxxxx 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com