Re: ceph osd commit latency increase over time, until restart

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



 

Hi Alexandre, 

I was curious if I had a similar issue, what value are you monitoring? I 
have quite a lot to choose from.


Bluestore.commitLat
Bluestore.kvLat
Bluestore.readLat
Bluestore.readOnodeMetaLat
Bluestore.readWaitAioLat
Bluestore.stateAioWaitLat
Bluestore.stateDoneLat
Bluestore.stateIoDoneLat
Bluestore.submitLat
Bluestore.throttleLat
Osd.opBeforeDequeueOpLat
Osd.opRProcessLatency
Osd.opWProcessLatency
Osd.subopLatency
Osd.subopWLatency
Rocksdb.getLatency
Rocksdb.submitLatency
Rocksdb.submitSyncLatency
RecoverystatePerf.repnotrecoveringLatency
RecoverystatePerf.waitupthruLatency
Osd.opRwPrepareLatency
RecoverystatePerf.primaryLatency
RecoverystatePerf.replicaactiveLatency
RecoverystatePerf.startedLatency
RecoverystatePerf.getlogLatency
RecoverystatePerf.initialLatency
RecoverystatePerf.recoveringLatency
ThrottleBluestoreThrottleBytes.wait
RecoverystatePerf.waitremoterecoveryreservedLatency



-----Original Message-----
From: Alexandre DERUMIER [mailto:aderumier@xxxxxxxxx] 
Sent: vrijdag 25 januari 2019 17:40
To: Sage Weil
Cc: ceph-users; ceph-devel
Subject: Re:  ceph osd commit latency increase over time, 
until restart

also, here the result of "perf diff 1mslatency.perfdata  
3mslatency.perfdata"

http://odisoweb1.odiso.net/perf_diff_ok_vs_bad.txt




----- Mail original -----
De: "aderumier" <aderumier@xxxxxxxxx>
À: "Sage Weil" <sage@xxxxxxxxxxxx>
Cc: "ceph-users" <ceph-users@xxxxxxxxxxxxxx>, "ceph-devel" 
<ceph-devel@xxxxxxxxxxxxxxx>
Envoyé: Vendredi 25 Janvier 2019 17:32:02
Objet: Re:  ceph osd commit latency increase over time, 
until restart

Hi again, 

I was able to perf it today, 

before restart, commit latency was between 3-5ms 

after restart at 17:11, latency is around 1ms 

http://odisoweb1.odiso.net/osd3_latency_3ms_vs_1ms.png 


here some perf reports: 

with 3ms latency: 
-----------------
perf report by caller: http://odisoweb1.odiso.net/bad-caller.txt
perf report by callee: http://odisoweb1.odiso.net/bad-callee.txt 


with 1ms latency
-----------------
perf report by caller: http://odisoweb1.odiso.net/ok-caller.txt
perf report by callee: http://odisoweb1.odiso.net/ok-callee.txt 



I'll retry next week, trying to have bigger latency difference. 

Alexandre 

----- Mail original -----
De: "aderumier" <aderumier@xxxxxxxxx>
À: "Sage Weil" <sage@xxxxxxxxxxxx>
Cc: "ceph-users" <ceph-users@xxxxxxxxxxxxxx>, "ceph-devel" 
<ceph-devel@xxxxxxxxxxxxxxx>
Envoyé: Vendredi 25 Janvier 2019 11:06:51
Objet: Re: ceph osd commit latency increase over time, until restart 

>>Can you capture a perf top or perf record to see where teh CPU time is 

>>going on one of the OSDs wth a high latency?

Yes, sure. I'll do it next week and send result to the mailing list. 

Thanks Sage ! 

----- Mail original -----
De: "Sage Weil" <sage@xxxxxxxxxxxx>
À: "aderumier" <aderumier@xxxxxxxxx>
Cc: "ceph-users" <ceph-users@xxxxxxxxxxxxxx>, "ceph-devel" 
<ceph-devel@xxxxxxxxxxxxxxx>
Envoyé: Vendredi 25 Janvier 2019 10:49:02
Objet: Re: ceph osd commit latency increase over time, until restart 

Can you capture a perf top or perf record to see where teh CPU time is 
going on one of the OSDs wth a high latency? 

Thanks! 
sage 


On Fri, 25 Jan 2019, Alexandre DERUMIER wrote: 

> 
> Hi,
> 
> I have a strange behaviour of my osd, on multiple clusters,
> 
> All cluster are running mimic 13.2.1,bluestore, with ssd or nvme 
> drivers, workload is rbd only, with qemu-kvm vms running with librbd + 

> snapshot/rbd export-diff/snapshotdelete each day for backup
> 
> When the osd are refreshly started, the commit latency is between 
0,5-1ms. 
> 
> But overtime, this latency increase slowly (maybe around 1ms by day), 
> until reaching crazy values like 20-200ms.
> 
> Some example graphs: 
> 
> http://odisoweb1.odiso.net/osdlatency1.png
> http://odisoweb1.odiso.net/osdlatency2.png
> 
> All osds have this behaviour, in all clusters. 
> 
> The latency of physical disks is ok. (Clusters are far to be full 
> loaded)
> 
> And if I restart the osd, the latency come back to 0,5-1ms. 
> 
> That's remember me old tcmalloc bug, but maybe could it be a bluestore 
memory bug ? 
> 
> Any Hints for counters/logs to check ? 
> 
> 
> Regards,
> 
> Alexandre
> 
> 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux