Re: ceph osd commit latency increase over time, until restart

Alexandre DERUMIER <aderumier@xxxxxxxxx> · Fri, 1 Mar 2019 09:29:26 +0100 (CET)

Hi,

some news, it seem that it's finally stable for me since 1week. (around 0,7ms of commit latency average)

http://odisoweb1.odiso.net/osdstable.png

The biggest change is the 18/02, where I have finished to rebuild all my osd, with 2 osd of 3TB for 1NVME 6TB.

(previously I only have done it on 1 node, so maybe with replication I didn't see the benefit)

I have also push bluestore_cache_kv_max to 1G, and keep osd_target_memory to default, and disable THP.

Differents buffers seem to be more constant too.  

But clearly, 2 x smaller 3TB osd with 3G osd_target_memory  vs 1 big osd 6TB with 6G osd_target_memory have a differents behaviour.
(maybe fragmentation, maybe rocksdb, maybe number of objects in cache, I really don't known)

----- Mail original -----
De: "Stefan Kooman" <stefan@xxxxxx>
À: "Wido den Hollander" <wido@xxxxxxxx>
Cc: "aderumier" <aderumier@xxxxxxxxx>, "Igor Fedotov" <ifedotov@xxxxxxx>, "ceph-users" <ceph-users@xxxxxxxxxxxxxx>, "ceph-devel" <ceph-devel@xxxxxxxxxxxxxxx>
Envoyé: Jeudi 28 Février 2019 21:57:05
Objet: Re:  ceph osd commit latency increase over time, until restart

Quoting Wido den Hollander (wido@xxxxxxxx): 

> Just wanted to chime in, I've seen this with Luminous+BlueStore+NVMe 
> OSDs as well. Over time their latency increased until we started to 
> notice I/O-wait inside VMs. 

On a Luminous 12.2.8 cluster with only SSDs we also hit this issue I 
guess. After restarting the OSD servers the latency would drop to normal 
values again. See https://owncloud.kooman.org/s/BpkUc7YM79vhcDj 

Reboots were finished at ~ 19:00. 

Gr. Stefan 

-- 
| BIT BV http://www.bit.nl/ Kamer van Koophandel 09090351 
| GPG: 0xD14839C6 +31 318 648 688 / info@xxxxxx 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com