Hi Stefan, >>currently i'm in the process of switching back from jemalloc to tcmalloc >>like suggested. This report makes me a little nervous about my change. Well,I'm really not sure that it's a tcmalloc bug. maybe bluestore related (don't have filestore anymore to compare) I need to compare with bigger latencies here an example, when all osd at 20-50ms before restart, then after restart (at 21:15), 1ms http://odisoweb1.odiso.net/latencybad.png I observe the latency in my guest vm too, on disks iowait. http://odisoweb1.odiso.net/latencybadvm.png >>Also i'm currently only monitoring latency for filestore osds. Which >>exact values out of the daemon do you use for bluestore? here my influxdb queries: It take op_latency.sum/op_latency.avgcount on last second. SELECT non_negative_derivative(first("op_latency.sum"), 1s)/non_negative_derivative(first("op_latency.avgcount"),1s) FROM "ceph" WHERE "host" =~ /^([[host]])$/ AND "id" =~ /^([[osd]])$/ AND $timeFilter GROUP BY time($interval), "host", "id" fill(previous) SELECT non_negative_derivative(first("op_w_latency.sum"), 1s)/non_negative_derivative(first("op_w_latency.avgcount"),1s) FROM "ceph" WHERE "host" =~ /^([[host]])$/ AND collection='osd' AND "id" =~ /^([[osd]])$/ AND $timeFilter GROUP BY time($interval), "host", "id" fill(previous) SELECT non_negative_derivative(first("op_w_process_latency.sum"), 1s)/non_negative_derivative(first("op_w_process_latency.avgcount"),1s) FROM "ceph" WHERE "host" =~ /^([[host]])$/ AND collection='osd' AND "id" =~ /^([[osd]])$/ AND $timeFilter GROUP BY time($interval), "host", "id" fill(previous) ----- Mail original ----- De: "Stefan Priebe, Profihost AG" <s.priebe@xxxxxxxxxxxx> À: "aderumier" <aderumier@xxxxxxxxx>, "Sage Weil" <sage@xxxxxxxxxxxx> Cc: "ceph-users" <ceph-users@xxxxxxxxxxxxxx>, "ceph-devel" <ceph-devel@xxxxxxxxxxxxxxx> Envoyé: Mercredi 30 Janvier 2019 08:45:33 Objet: Re: [ceph-users] ceph osd commit latency increase over time, until restart Hi, Am 30.01.19 um 08:33 schrieb Alexandre DERUMIER: > Hi, > > here some new results, > different osd/ different cluster > > before osd restart latency was between 2-5ms > after osd restart is around 1-1.5ms > > http://odisoweb1.odiso.net/cephperf2/bad.txt (2-5ms) > http://odisoweb1.odiso.net/cephperf2/ok.txt (1-1.5ms) > http://odisoweb1.odiso.net/cephperf2/diff.txt > > From what I see in diff, the biggest difference is in tcmalloc, but maybe I'm wrong. > (I'm using tcmalloc 2.5-2.2) currently i'm in the process of switching back from jemalloc to tcmalloc like suggested. This report makes me a little nervous about my change. Also i'm currently only monitoring latency for filestore osds. Which exact values out of the daemon do you use for bluestore? I would like to check if i see the same behaviour. Greets, Stefan > > ----- Mail original ----- > De: "Sage Weil" <sage@xxxxxxxxxxxx> > À: "aderumier" <aderumier@xxxxxxxxx> > Cc: "ceph-users" <ceph-users@xxxxxxxxxxxxxx>, "ceph-devel" <ceph-devel@xxxxxxxxxxxxxxx> > Envoyé: Vendredi 25 Janvier 2019 10:49:02 > Objet: Re: ceph osd commit latency increase over time, until restart > > Can you capture a perf top or perf record to see where teh CPU time is > going on one of the OSDs wth a high latency? > > Thanks! > sage > > > On Fri, 25 Jan 2019, Alexandre DERUMIER wrote: > >> >> Hi, >> >> I have a strange behaviour of my osd, on multiple clusters, >> >> All cluster are running mimic 13.2.1,bluestore, with ssd or nvme drivers, >> workload is rbd only, with qemu-kvm vms running with librbd + snapshot/rbd export-diff/snapshotdelete each day for backup >> >> When the osd are refreshly started, the commit latency is between 0,5-1ms. >> >> But overtime, this latency increase slowly (maybe around 1ms by day), until reaching crazy >> values like 20-200ms. >> >> Some example graphs: >> >> http://odisoweb1.odiso.net/osdlatency1.png >> http://odisoweb1.odiso.net/osdlatency2.png >> >> All osds have this behaviour, in all clusters. >> >> The latency of physical disks is ok. (Clusters are far to be full loaded) >> >> And if I restart the osd, the latency come back to 0,5-1ms. >> >> That's remember me old tcmalloc bug, but maybe could it be a bluestore memory bug ? >> >> Any Hints for counters/logs to check ? >> >> >> Regards, >> >> Alexandre >> >> > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >