Can you capture a perf top or perf record to see where teh CPU time is going on one of the OSDs wth a high latency? Thanks! sage On Fri, 25 Jan 2019, Alexandre DERUMIER wrote: > > Hi, > > I have a strange behaviour of my osd, on multiple clusters, > > All cluster are running mimic 13.2.1,bluestore, with ssd or nvme drivers, > workload is rbd only, with qemu-kvm vms running with librbd + snapshot/rbd export-diff/snapshotdelete each day for backup > > When the osd are refreshly started, the commit latency is between 0,5-1ms. > > But overtime, this latency increase slowly (maybe around 1ms by day), until reaching crazy > values like 20-200ms. > > Some example graphs: > > http://odisoweb1.odiso.net/osdlatency1.png > http://odisoweb1.odiso.net/osdlatency2.png > > All osds have this behaviour, in all clusters. > > The latency of physical disks is ok. (Clusters are far to be full loaded) > > And if I restart the osd, the latency come back to 0,5-1ms. > > That's remember me old tcmalloc bug, but maybe could it be a bluestore memory bug ? > > Any Hints for counters/logs to check ? > > > Regards, > > Alexandre > >