Re: ceph osd commit latency increase over time, until restart

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



also, here the result of "perf diff 1mslatency.perfdata  3mslatency.perfdata"

http://odisoweb1.odiso.net/perf_diff_ok_vs_bad.txt




----- Mail original -----
De: "aderumier" <aderumier@xxxxxxxxx>
À: "Sage Weil" <sage@xxxxxxxxxxxx>
Cc: "ceph-users" <ceph-users@xxxxxxxxxxxxxx>, "ceph-devel" <ceph-devel@xxxxxxxxxxxxxxx>
Envoyé: Vendredi 25 Janvier 2019 17:32:02
Objet: Re:  ceph osd commit latency increase over time, until restart

Hi again, 

I was able to perf it today, 

before restart, commit latency was between 3-5ms 

after restart at 17:11, latency is around 1ms 

http://odisoweb1.odiso.net/osd3_latency_3ms_vs_1ms.png 


here some perf reports: 

with 3ms latency: 
----------------- 
perf report by caller: http://odisoweb1.odiso.net/bad-caller.txt 
perf report by callee: http://odisoweb1.odiso.net/bad-callee.txt 


with 1ms latency 
----------------- 
perf report by caller: http://odisoweb1.odiso.net/ok-caller.txt 
perf report by callee: http://odisoweb1.odiso.net/ok-callee.txt 



I'll retry next week, trying to have bigger latency difference. 

Alexandre 

----- Mail original ----- 
De: "aderumier" <aderumier@xxxxxxxxx> 
À: "Sage Weil" <sage@xxxxxxxxxxxx> 
Cc: "ceph-users" <ceph-users@xxxxxxxxxxxxxx>, "ceph-devel" <ceph-devel@xxxxxxxxxxxxxxx> 
Envoyé: Vendredi 25 Janvier 2019 11:06:51 
Objet: Re: ceph osd commit latency increase over time, until restart 

>>Can you capture a perf top or perf record to see where teh CPU time is 
>>going on one of the OSDs wth a high latency? 

Yes, sure. I'll do it next week and send result to the mailing list. 

Thanks Sage ! 

----- Mail original ----- 
De: "Sage Weil" <sage@xxxxxxxxxxxx> 
À: "aderumier" <aderumier@xxxxxxxxx> 
Cc: "ceph-users" <ceph-users@xxxxxxxxxxxxxx>, "ceph-devel" <ceph-devel@xxxxxxxxxxxxxxx> 
Envoyé: Vendredi 25 Janvier 2019 10:49:02 
Objet: Re: ceph osd commit latency increase over time, until restart 

Can you capture a perf top or perf record to see where teh CPU time is 
going on one of the OSDs wth a high latency? 

Thanks! 
sage 


On Fri, 25 Jan 2019, Alexandre DERUMIER wrote: 

> 
> Hi, 
> 
> I have a strange behaviour of my osd, on multiple clusters, 
> 
> All cluster are running mimic 13.2.1,bluestore, with ssd or nvme drivers, 
> workload is rbd only, with qemu-kvm vms running with librbd + snapshot/rbd export-diff/snapshotdelete each day for backup 
> 
> When the osd are refreshly started, the commit latency is between 0,5-1ms. 
> 
> But overtime, this latency increase slowly (maybe around 1ms by day), until reaching crazy 
> values like 20-200ms. 
> 
> Some example graphs: 
> 
> http://odisoweb1.odiso.net/osdlatency1.png 
> http://odisoweb1.odiso.net/osdlatency2.png 
> 
> All osds have this behaviour, in all clusters. 
> 
> The latency of physical disks is ok. (Clusters are far to be full loaded) 
> 
> And if I restart the osd, the latency come back to 0,5-1ms. 
> 
> That's remember me old tcmalloc bug, but maybe could it be a bluestore memory bug ? 
> 
> Any Hints for counters/logs to check ? 
> 
> 
> Regards, 
> 
> Alexandre 
> 
> 

_______________________________________________ 
ceph-users mailing list 
ceph-users@xxxxxxxxxxxxxx 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux