Hi, some news, it seem that it's finally stable for me since 1week. (around 0,7ms of commit latency average) http://odisoweb1.odiso.net/osdstable.png The biggest change is the 18/02, where I have finished to rebuild all my osd, with 2 osd of 3TB for 1NVME 6TB. (previously I only have done it on 1 node, so maybe with replication I didn't see the benefit) I have also push bluestore_cache_kv_max to 1G, and keep osd_target_memory to default, and disable THP. Differents buffers seem to be more constant too. But clearly, 2 x smaller 3TB osd with 3G osd_target_memory vs 1 big osd 6TB with 6G osd_target_memory have a differents behaviour. (maybe fragmentation, maybe rocksdb, maybe number of objects in cache, I really don't known) ----- Mail original ----- De: "Stefan Kooman" <stefan@xxxxxx> À: "Wido den Hollander" <wido@xxxxxxxx> Cc: "aderumier" <aderumier@xxxxxxxxx>, "Igor Fedotov" <ifedotov@xxxxxxx>, "ceph-users" <ceph-users@xxxxxxxxxxxxxx>, "ceph-devel" <ceph-devel@xxxxxxxxxxxxxxx> Envoyé: Jeudi 28 Février 2019 21:57:05 Objet: Re: ceph osd commit latency increase over time, until restart Quoting Wido den Hollander (wido@xxxxxxxx): > Just wanted to chime in, I've seen this with Luminous+BlueStore+NVMe > OSDs as well. Over time their latency increased until we started to > notice I/O-wait inside VMs. On a Luminous 12.2.8 cluster with only SSDs we also hit this issue I guess. After restarting the OSD servers the latency would drop to normal values again. See https://owncloud.kooman.org/s/BpkUc7YM79vhcDj Reboots were finished at ~ 19:00. Gr. Stefan -- | BIT BV http://www.bit.nl/ Kamer van Koophandel 09090351 | GPG: 0xD14839C6 +31 318 648 688 / info@xxxxxx _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com