Hi, Am 30.01.19 um 14:59 schrieb Alexandre DERUMIER: > Hi Stefan, > >>> currently i'm in the process of switching back from jemalloc to tcmalloc >>> like suggested. This report makes me a little nervous about my change. > Well,I'm really not sure that it's a tcmalloc bug. > maybe bluestore related (don't have filestore anymore to compare) > I need to compare with bigger latencies > > here an example, when all osd at 20-50ms before restart, then after restart (at 21:15), 1ms > http://odisoweb1.odiso.net/latencybad.png > > I observe the latency in my guest vm too, on disks iowait. > > http://odisoweb1.odiso.net/latencybadvm.png > >>> Also i'm currently only monitoring latency for filestore osds. Which >>> exact values out of the daemon do you use for bluestore? > > here my influxdb queries: > > It take op_latency.sum/op_latency.avgcount on last second. > > > SELECT non_negative_derivative(first("op_latency.sum"), 1s)/non_negative_derivative(first("op_latency.avgcount"),1s) FROM "ceph" WHERE "host" =~ /^([[host]])$/ AND "id" =~ /^([[osd]])$/ AND $timeFilter GROUP BY time($interval), "host", "id" fill(previous) > > > SELECT non_negative_derivative(first("op_w_latency.sum"), 1s)/non_negative_derivative(first("op_w_latency.avgcount"),1s) FROM "ceph" WHERE "host" =~ /^([[host]])$/ AND collection='osd' AND "id" =~ /^([[osd]])$/ AND $timeFilter GROUP BY time($interval), "host", "id" fill(previous) > > > SELECT non_negative_derivative(first("op_w_process_latency.sum"), 1s)/non_negative_derivative(first("op_w_process_latency.avgcount"),1s) FROM "ceph" WHERE "host" =~ /^([[host]])$/ AND collection='osd' AND "id" =~ /^([[osd]])$/ AND $timeFilter GROUP BY time($interval), "host", "id" fill(previous) Thanks. Is there any reason you monitor op_w_latency but not op_r_latency but instead op_latency? Also why do you monitor op_w_process_latency? but not op_r_process_latency? greets, Stefan > > > > > > ----- Mail original ----- > De: "Stefan Priebe, Profihost AG" <s.priebe@xxxxxxxxxxxx> > À: "aderumier" <aderumier@xxxxxxxxx>, "Sage Weil" <sage@xxxxxxxxxxxx> > Cc: "ceph-users" <ceph-users@xxxxxxxxxxxxxx>, "ceph-devel" <ceph-devel@xxxxxxxxxxxxxxx> > Envoyé: Mercredi 30 Janvier 2019 08:45:33 > Objet: Re: ceph osd commit latency increase over time, until restart > > Hi, > > Am 30.01.19 um 08:33 schrieb Alexandre DERUMIER: >> Hi, >> >> here some new results, >> different osd/ different cluster >> >> before osd restart latency was between 2-5ms >> after osd restart is around 1-1.5ms >> >> http://odisoweb1.odiso.net/cephperf2/bad.txt (2-5ms) >> http://odisoweb1.odiso.net/cephperf2/ok.txt (1-1.5ms) >> http://odisoweb1.odiso.net/cephperf2/diff.txt >> >> From what I see in diff, the biggest difference is in tcmalloc, but maybe I'm wrong. >> (I'm using tcmalloc 2.5-2.2) > > currently i'm in the process of switching back from jemalloc to tcmalloc > like suggested. This report makes me a little nervous about my change. > > Also i'm currently only monitoring latency for filestore osds. Which > exact values out of the daemon do you use for bluestore? > > I would like to check if i see the same behaviour. > > Greets, > Stefan > >> >> ----- Mail original ----- >> De: "Sage Weil" <sage@xxxxxxxxxxxx> >> À: "aderumier" <aderumier@xxxxxxxxx> >> Cc: "ceph-users" <ceph-users@xxxxxxxxxxxxxx>, "ceph-devel" <ceph-devel@xxxxxxxxxxxxxxx> >> Envoyé: Vendredi 25 Janvier 2019 10:49:02 >> Objet: Re: ceph osd commit latency increase over time, until restart >> >> Can you capture a perf top or perf record to see where teh CPU time is >> going on one of the OSDs wth a high latency? >> >> Thanks! >> sage >> >> >> On Fri, 25 Jan 2019, Alexandre DERUMIER wrote: >> >>> >>> Hi, >>> >>> I have a strange behaviour of my osd, on multiple clusters, >>> >>> All cluster are running mimic 13.2.1,bluestore, with ssd or nvme drivers, >>> workload is rbd only, with qemu-kvm vms running with librbd + snapshot/rbd export-diff/snapshotdelete each day for backup >>> >>> When the osd are refreshly started, the commit latency is between 0,5-1ms. >>> >>> But overtime, this latency increase slowly (maybe around 1ms by day), until reaching crazy >>> values like 20-200ms. >>> >>> Some example graphs: >>> >>> http://odisoweb1.odiso.net/osdlatency1.png >>> http://odisoweb1.odiso.net/osdlatency2.png >>> >>> All osds have this behaviour, in all clusters. >>> >>> The latency of physical disks is ok. (Clusters are far to be full loaded) >>> >>> And if I restart the osd, the latency come back to 0,5-1ms. >>> >>> That's remember me old tcmalloc bug, but maybe could it be a bluestore memory bug ? >>> >>> Any Hints for counters/logs to check ? >>> >>> >>> Regards, >>> >>> Alexandre >>> >>> >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com