Re: ceph osd commit latency increase over time, until restart

Stefan Priebe - Profihost AG <s.priebe@xxxxxxxxxxxx> · Wed, 30 Jan 2019 19:50:20 +0100

Hi,

Am 30.01.19 um 14:59 schrieb Alexandre DERUMIER:
> Hi Stefan,
> 
>>> currently i'm in the process of switching back from jemalloc to tcmalloc 
>>> like suggested. This report makes me a little nervous about my change. 
> Well,I'm really not sure that it's a tcmalloc bug. 
> maybe bluestore related (don't have filestore anymore to compare)
> I need to compare with bigger latencies
> 
> here an example, when all osd at 20-50ms before restart, then after restart (at 21:15), 1ms
> http://odisoweb1.odiso.net/latencybad.png
> 
> I observe the latency in my guest vm too, on disks iowait.
> 
> http://odisoweb1.odiso.net/latencybadvm.png
> 
>>> Also i'm currently only monitoring latency for filestore osds. Which
>>> exact values out of the daemon do you use for bluestore?
> 
> here my influxdb queries:
> 
> It take op_latency.sum/op_latency.avgcount on last second.
> 
> 
> SELECT non_negative_derivative(first("op_latency.sum"), 1s)/non_negative_derivative(first("op_latency.avgcount"),1s)   FROM "ceph" WHERE "host" =~  /^([[host]])$/  AND "id" =~ /^([[osd]])$/ AND $timeFilter GROUP BY time($interval), "host", "id" fill(previous)
> 
> 
> SELECT non_negative_derivative(first("op_w_latency.sum"), 1s)/non_negative_derivative(first("op_w_latency.avgcount"),1s)   FROM "ceph" WHERE "host" =~ /^([[host]])$/  AND collection='osd'  AND  "id" =~ /^([[osd]])$/ AND $timeFilter GROUP BY time($interval), "host", "id" fill(previous)
> 
> 
> SELECT non_negative_derivative(first("op_w_process_latency.sum"), 1s)/non_negative_derivative(first("op_w_process_latency.avgcount"),1s)   FROM "ceph" WHERE "host" =~ /^([[host]])$/  AND collection='osd'  AND  "id" =~ /^([[osd]])$/ AND $timeFilter GROUP BY time($interval), "host", "id" fill(previous)

Thanks. Is there any reason you monitor op_w_latency but not
op_r_latency but instead op_latency?

Also why do you monitor op_w_process_latency? but not op_r_process_latency?

greets,
Stefan

> 
> 
> 
> 
> 
> ----- Mail original -----
> De: "Stefan Priebe, Profihost AG" <s.priebe@xxxxxxxxxxxx>
> À: "aderumier" <aderumier@xxxxxxxxx>, "Sage Weil" <sage@xxxxxxxxxxxx>
> Cc: "ceph-users" <ceph-users@xxxxxxxxxxxxxx>, "ceph-devel" <ceph-devel@xxxxxxxxxxxxxxx>
> Envoyé: Mercredi 30 Janvier 2019 08:45:33
> Objet: Re:  ceph osd commit latency increase over time, until restart
> 
> Hi, 
> 
> Am 30.01.19 um 08:33 schrieb Alexandre DERUMIER: 
>> Hi, 
>>
>> here some new results, 
>> different osd/ different cluster 
>>
>> before osd restart latency was between 2-5ms 
>> after osd restart is around 1-1.5ms 
>>
>> http://odisoweb1.odiso.net/cephperf2/bad.txt (2-5ms) 
>> http://odisoweb1.odiso.net/cephperf2/ok.txt (1-1.5ms) 
>> http://odisoweb1.odiso.net/cephperf2/diff.txt 
>>
>> From what I see in diff, the biggest difference is in tcmalloc, but maybe I'm wrong. 
>> (I'm using tcmalloc 2.5-2.2) 
> 
> currently i'm in the process of switching back from jemalloc to tcmalloc 
> like suggested. This report makes me a little nervous about my change. 
> 
> Also i'm currently only monitoring latency for filestore osds. Which 
> exact values out of the daemon do you use for bluestore? 
> 
> I would like to check if i see the same behaviour. 
> 
> Greets, 
> Stefan 
> 
>>
>> ----- Mail original ----- 
>> De: "Sage Weil" <sage@xxxxxxxxxxxx> 
>> À: "aderumier" <aderumier@xxxxxxxxx> 
>> Cc: "ceph-users" <ceph-users@xxxxxxxxxxxxxx>, "ceph-devel" <ceph-devel@xxxxxxxxxxxxxxx> 
>> Envoyé: Vendredi 25 Janvier 2019 10:49:02 
>> Objet: Re: ceph osd commit latency increase over time, until restart 
>>
>> Can you capture a perf top or perf record to see where teh CPU time is 
>> going on one of the OSDs wth a high latency? 
>>
>> Thanks! 
>> sage 
>>
>>
>> On Fri, 25 Jan 2019, Alexandre DERUMIER wrote: 
>>
>>>
>>> Hi, 
>>>
>>> I have a strange behaviour of my osd, on multiple clusters, 
>>>
>>> All cluster are running mimic 13.2.1,bluestore, with ssd or nvme drivers, 
>>> workload is rbd only, with qemu-kvm vms running with librbd + snapshot/rbd export-diff/snapshotdelete each day for backup 
>>>
>>> When the osd are refreshly started, the commit latency is between 0,5-1ms. 
>>>
>>> But overtime, this latency increase slowly (maybe around 1ms by day), until reaching crazy 
>>> values like 20-200ms. 
>>>
>>> Some example graphs: 
>>>
>>> http://odisoweb1.odiso.net/osdlatency1.png 
>>> http://odisoweb1.odiso.net/osdlatency2.png 
>>>
>>> All osds have this behaviour, in all clusters. 
>>>
>>> The latency of physical disks is ok. (Clusters are far to be full loaded) 
>>>
>>> And if I restart the osd, the latency come back to 0,5-1ms. 
>>>
>>> That's remember me old tcmalloc bug, but maybe could it be a bluestore memory bug ? 
>>>
>>> Any Hints for counters/logs to check ? 
>>>
>>>
>>> Regards, 
>>>
>>> Alexandre 
>>>
>>>
>>
>> _______________________________________________ 
>> ceph-users mailing list 
>> ceph-users@xxxxxxxxxxxxxx 
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
>>
> 
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com