Re: ceph osd commit latency increase over time, until restart

Alexandre DERUMIER <aderumier@xxxxxxxxx> · Tue, 19 Feb 2019 17:03:58 +0100 (CET)

>>I think op_w_process_latency includes replication times, not 100% sure 
>>though. 
>>
>>So restarting other nodes might affect latencies at this specific OSD. 

Seem to be the case, I have compared with sub_op_latency.

I have changed my graph, to clearly identify the osd where the latency is high. 

I have done some changes in my setup:
- 2 osd by nvme (2x3TB by osd), with 6GB memory. (instead 1osd of 6TB with 12G memory).
- disabling transparent hugepage

Since 24h, latencies are still low (between 0.7-1.2ms).

I'm also seeing that total memory used (#free), is lower than before (48GB (8osd x 6GB) vs 56GB (4osd x 12GB).

I'll send more stats tomorrow.

Alexandre

----- Mail original -----
De: "Igor Fedotov" <ifedotov@xxxxxxx>
À: "Alexandre Derumier" <aderumier@xxxxxxxxx>, "Wido den Hollander" <wido@xxxxxxxx>
Cc: "ceph-users" <ceph-users@xxxxxxxxxxxxxx>, "ceph-devel" <ceph-devel@xxxxxxxxxxxxxxx>
Envoyé: Mardi 19 Février 2019 11:12:43
Objet: Re:  ceph osd commit latency increase over time, until restart

Hi Alexander, 

I think op_w_process_latency includes replication times, not 100% sure 
though. 

So restarting other nodes might affect latencies at this specific OSD. 

Thanks, 

Igot 

On 2/16/2019 11:29 AM, Alexandre DERUMIER wrote: 
>>> There are 10 OSDs in these systems with 96GB of memory in total. We are 
>>> runnigh with memory target on 6G right now to make sure there is no 
>>> leakage. If this runs fine for a longer period we will go to 8GB per OSD 
>>> so it will max out on 80GB leaving 16GB as spare. 
> Thanks Wido. I send results monday with my increased memory 
> 
> 
> 
> @Igor: 
> 
> I have also notice, that sometime when I have bad latency on an osd on node1 (restarted 12h ago for example). 
> (op_w_process_latency). 
> 
> If I restart osds on other nodes (last restart some days ago, so with bigger latency), it's reducing latency on osd of node1 too. 
> 
> does "op_w_process_latency" counter include replication time ? 
> 
> ----- Mail original ----- 
> De: "Wido den Hollander" <wido@xxxxxxxx> 
> À: "aderumier" <aderumier@xxxxxxxxx> 
> Cc: "Igor Fedotov" <ifedotov@xxxxxxx>, "ceph-users" <ceph-users@xxxxxxxxxxxxxx>, "ceph-devel" <ceph-devel@xxxxxxxxxxxxxxx> 
> Envoyé: Vendredi 15 Février 2019 14:59:30 
> Objet: Re:  ceph osd commit latency increase over time, until restart 
> 
> On 2/15/19 2:54 PM, Alexandre DERUMIER wrote: 
>>>> Just wanted to chime in, I've seen this with Luminous+BlueStore+NVMe 
>>>> OSDs as well. Over time their latency increased until we started to 
>>>> notice I/O-wait inside VMs. 
>> I'm also notice it in the vms. BTW, what it your nvme disk size ? 
> Samsung PM983 3.84TB SSDs in both clusters. 
> 
>> 
>>>> A restart fixed it. We also increased memory target from 4G to 6G on 
>>>> these OSDs as the memory would allow it. 
>> I have set memory to 6GB this morning, with 2 osds of 3TB for 6TB nvme. 
>> (my last test was 8gb with 1osd of 6TB, but that didn't help) 
> There are 10 OSDs in these systems with 96GB of memory in total. We are 
> runnigh with memory target on 6G right now to make sure there is no 
> leakage. If this runs fine for a longer period we will go to 8GB per OSD 
> so it will max out on 80GB leaving 16GB as spare. 
> 
> As these OSDs were all restarted earlier this week I can't tell how it 
> will hold up over a longer period. Monitoring (Zabbix) shows the latency 
> is fine at the moment. 
> 
> Wido 
> 
>> 
>> ----- Mail original ----- 
>> De: "Wido den Hollander" <wido@xxxxxxxx> 
>> À: "Alexandre Derumier" <aderumier@xxxxxxxxx>, "Igor Fedotov" <ifedotov@xxxxxxx> 
>> Cc: "ceph-users" <ceph-users@xxxxxxxxxxxxxx>, "ceph-devel" <ceph-devel@xxxxxxxxxxxxxxx> 
>> Envoyé: Vendredi 15 Février 2019 14:50:34 
>> Objet: Re:  ceph osd commit latency increase over time, until restart 
>> 
>> On 2/15/19 2:31 PM, Alexandre DERUMIER wrote: 
>>> Thanks Igor. 
>>> 
>>> I'll try to create multiple osds by nvme disk (6TB) to see if behaviour is different. 
>>> 
>>> I have other clusters (same ceph.conf), but with 1,6TB drives, and I don't see this latency problem. 
>>> 
>>> 
>> Just wanted to chime in, I've seen this with Luminous+BlueStore+NVMe 
>> OSDs as well. Over time their latency increased until we started to 
>> notice I/O-wait inside VMs. 
>> 
>> A restart fixed it. We also increased memory target from 4G to 6G on 
>> these OSDs as the memory would allow it. 
>> 
>> But we noticed this on two different 12.2.10/11 clusters. 
>> 
>> A restart made the latency drop. Not only the numbers, but the 
>> real-world latency as experienced by a VM as well. 
>> 
>> Wido 
>> 
>>> 
>>> 
>>> 
>>> 
>>> ----- Mail original ----- 
>>> De: "Igor Fedotov" <ifedotov@xxxxxxx> 
>>> Cc: "ceph-users" <ceph-users@xxxxxxxxxxxxxx>, "ceph-devel" <ceph-devel@xxxxxxxxxxxxxxx> 
>>> Envoyé: Vendredi 15 Février 2019 13:47:57 
>>> Objet: Re:  ceph osd commit latency increase over time, until restart 
>>> 
>>> Hi Alexander, 
>>> 
>>> I've read through your reports, nothing obvious so far. 
>>> 
>>> I can only see several times average latency increase for OSD write ops 
>>> (in seconds) 
>>> 0.002040060 (first hour) vs. 
>>> 
>>> 0.002483516 (last 24 hours) vs. 
>>> 0.008382087 (last hour) 
>>> 
>>> subop_w_latency: 
>>> 0.000478934 (first hour) vs. 
>>> 0.000537956 (last 24 hours) vs. 
>>> 0.003073475 (last hour) 
>>> 
>>> and OSD read ops, osd_r_latency: 
>>> 
>>> 0.000408595 (first hour) 
>>> 0.000709031 (24 hours) 
>>> 0.004979540 (last hour) 
>>> 
>>> What's interesting is that such latency differences aren't observed at 
>>> neither BlueStore level (any _lat params under "bluestore" section) nor 
>>> rocksdb one. 
>>> 
>>> Which probably means that the issue is rather somewhere above BlueStore. 
>>> 
>>> Suggest to proceed with perf dumps collection to see if the picture 
>>> stays the same. 
>>> 
>>> W.r.t. memory usage you observed I see nothing suspicious so far - No 
>>> decrease in RSS report is a known artifact that seems to be safe. 
>>> 
>>> Thanks, 
>>> Igor 
>>> 
>>> On 2/13/2019 11:42 AM, Alexandre DERUMIER wrote: 
>>>> Hi Igor, 
>>>> 
>>>> Thanks again for helping ! 
>>>> 
>>>> 
>>>> 
>>>> I have upgrade to last mimic this weekend, and with new autotune memory, 
>>>> I have setup osd_memory_target to 8G. (my nvme are 6TB) 
>>>> 
>>>> 
>>>> I have done a lot of perf dump and mempool dump and ps of process to 
>>> see rss memory at different hours, 
>>>> here the reports for osd.0: 
>>>> 
>>>> http://odisoweb1.odiso.net/perfanalysis/ 
>>>> 
>>>> 
>>>> osd has been started the 12-02-2019 at 08:00 
>>>> 
>>>> first report after 1h running 
>>>> http://odisoweb1.odiso.net/perfanalysis/osd.0.12-02-2018.09:30.perf.txt 
>>>> 
>>> http://odisoweb1.odiso.net/perfanalysis/osd.0.12-02-2018.09:30.dump_mempools.txt 
>>>> http://odisoweb1.odiso.net/perfanalysis/osd.0.12-02-2018.09:30.ps.txt 
>>>> 
>>>> 
>>>> 
>>>> report after 24 before counter resets 
>>>> 
>>>> http://odisoweb1.odiso.net/perfanalysis/osd.0.13-02-2018.08:00.perf.txt 
>>>> 
>>> http://odisoweb1.odiso.net/perfanalysis/osd.0.13-02-2018.08:00.dump_mempools.txt 
>>>> http://odisoweb1.odiso.net/perfanalysis/osd.0.13-02-2018.08:00.ps.txt 
>>>> 
>>>> report 1h after counter reset 
>>>> http://odisoweb1.odiso.net/perfanalysis/osd.0.13-02-2018.09:00.perf.txt 
>>>> 
>>> http://odisoweb1.odiso.net/perfanalysis/osd.0.13-02-2018.09:00.dump_mempools.txt 
>>>> http://odisoweb1.odiso.net/perfanalysis/osd.0.13-02-2018.09:00.ps.txt 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> I'm seeing the bluestore buffer bytes memory increasing up to 4G 
>>> around 12-02-2019 at 14:00 
>>>> http://odisoweb1.odiso.net/perfanalysis/graphs2.png 
>>>> Then after that, slowly decreasing. 
>>>> 
>>>> 
>>>> Another strange thing, 
>>>> I'm seeing total bytes at 5G at 12-02-2018.13:30 
>>>> 
>>> http://odisoweb1.odiso.net/perfanalysis/osd.0.12-02-2018.13:30.dump_mempools.txt 
>>>> Then is decreasing over time (around 3,7G this morning), but RSS is 
>>> still at 8G 
>>>> 
>>>> I'm graphing mempools counters too since yesterday, so I'll able to 
>>> track them over time. 
>>>> ----- Mail original ----- 
>>>> De: "Igor Fedotov" <ifedotov@xxxxxxx> 
>>>> À: "Alexandre Derumier" <aderumier@xxxxxxxxx> 
>>>> Cc: "Sage Weil" <sage@xxxxxxxxxxxx>, "ceph-users" 
>>> <ceph-users@xxxxxxxxxxxxxx>, "ceph-devel" <ceph-devel@xxxxxxxxxxxxxxx> 
>>>> Envoyé: Lundi 11 Février 2019 12:03:17 
>>>> Objet: Re:  ceph osd commit latency increase over time, 
>>> until restart 
>>>> On 2/8/2019 6:57 PM, Alexandre DERUMIER wrote: 
>>>>> another mempool dump after 1h run. (latency ok) 
>>>>> 
>>>>> Biggest difference: 
>>>>> 
>>>>> before restart 
>>>>> ------------- 
>>>>> "bluestore_cache_other": { 
>>>>> "items": 48661920, 
>>>>> "bytes": 1539544228 
>>>>> }, 
>>>>> "bluestore_cache_data": { 
>>>>> "items": 54, 
>>>>> "bytes": 643072 
>>>>> }, 
>>>>> (other caches seem to be quite low too, like bluestore_cache_other 
>>> take all the memory) 
>>>>> 
>>>>> After restart 
>>>>> ------------- 
>>>>> "bluestore_cache_other": { 
>>>>> "items": 12432298, 
>>>>> "bytes": 500834899 
>>>>> }, 
>>>>> "bluestore_cache_data": { 
>>>>> "items": 40084, 
>>>>> "bytes": 1056235520 
>>>>> }, 
>>>>> 
>>>> This is fine as cache is warming after restart and some rebalancing 
>>>> between data and metadata might occur. 
>>>> 
>>>> What relates to allocator and most probably to fragmentation growth is : 
>>>> 
>>>> "bluestore_alloc": { 
>>>> "items": 165053952, 
>>>> "bytes": 165053952 
>>>> }, 
>>>> 
>>>> which had been higher before the reset (if I got these dumps' order 
>>>> properly) 
>>>> 
>>>> "bluestore_alloc": { 
>>>> "items": 210243456, 
>>>> "bytes": 210243456 
>>>> }, 
>>>> 
>>>> But as I mentioned - I'm not 100% sure this might cause such a huge 
>>>> latency increase... 
>>>> 
>>>> Do you have perf counters dump after the restart? 
>>>> 
>>>> Could you collect some more dumps - for both mempool and perf counters? 
>>>> 
>>>> So ideally I'd like to have: 
>>>> 
>>>> 1) mempool/perf counters dumps after the restart (1hour is OK) 
>>>> 
>>>> 2) mempool/perf counters dumps in 24+ hours after restart 
>>>> 
>>>> 3) reset perf counters after 2), wait for 1 hour (and without OSD 
>>>> restart) and dump mempool/perf counters again. 
>>>> 
>>>> So we'll be able to learn both allocator mem usage growth and operation 
>>>> latency distribution for the following periods: 
>>>> 
>>>> a) 1st hour after restart 
>>>> 
>>>> b) 25th hour. 
>>>> 
>>>> 
>>>> Thanks, 
>>>> 
>>>> Igor 
>>>> 
>>>> 
>>>>> full mempool dump after restart 
>>>>> ------------------------------- 
>>>>> 
>>>>> { 
>>>>> "mempool": { 
>>>>> "by_pool": { 
>>>>> "bloom_filter": { 
>>>>> "items": 0, 
>>>>> "bytes": 0 
>>>>> }, 
>>>>> "bluestore_alloc": { 
>>>>> "items": 165053952, 
>>>>> "bytes": 165053952 
>>>>> }, 
>>>>> "bluestore_cache_data": { 
>>>>> "items": 40084, 
>>>>> "bytes": 1056235520 
>>>>> }, 
>>>>> "bluestore_cache_onode": { 
>>>>> "items": 22225, 
>>>>> "bytes": 14935200 
>>>>> }, 
>>>>> "bluestore_cache_other": { 
>>>>> "items": 12432298, 
>>>>> "bytes": 500834899 
>>>>> }, 
>>>>> "bluestore_fsck": { 
>>>>> "items": 0, 
>>>>> "bytes": 0 
>>>>> }, 
>>>>> "bluestore_txc": { 
>>>>> "items": 11, 
>>>>> "bytes": 8184 
>>>>> }, 
>>>>> "bluestore_writing_deferred": { 
>>>>> "items": 5047, 
>>>>> "bytes": 22673736 
>>>>> }, 
>>>>> "bluestore_writing": { 
>>>>> "items": 91, 
>>>>> "bytes": 1662976 
>>>>> }, 
>>>>> "bluefs": { 
>>>>> "items": 1907, 
>>>>> "bytes": 95600 
>>>>> }, 
>>>>> "buffer_anon": { 
>>>>> "items": 19664, 
>>>>> "bytes": 25486050 
>>>>> }, 
>>>>> "buffer_meta": { 
>>>>> "items": 46189, 
>>>>> "bytes": 2956096 
>>>>> }, 
>>>>> "osd": { 
>>>>> "items": 243, 
>>>>> "bytes": 3089016 
>>>>> }, 
>>>>> "osd_mapbl": { 
>>>>> "items": 17, 
>>>>> "bytes": 214366 
>>>>> }, 
>>>>> "osd_pglog": { 
>>>>> "items": 889673, 
>>>>> "bytes": 367160400 
>>>>> }, 
>>>>> "osdmap": { 
>>>>> "items": 3803, 
>>>>> "bytes": 224552 
>>>>> }, 
>>>>> "osdmap_mapping": { 
>>>>> "items": 0, 
>>>>> "bytes": 0 
>>>>> }, 
>>>>> "pgmap": { 
>>>>> "items": 0, 
>>>>> "bytes": 0 
>>>>> }, 
>>>>> "mds_co": { 
>>>>> "items": 0, 
>>>>> "bytes": 0 
>>>>> }, 
>>>>> "unittest_1": { 
>>>>> "items": 0, 
>>>>> "bytes": 0 
>>>>> }, 
>>>>> "unittest_2": { 
>>>>> "items": 0, 
>>>>> "bytes": 0 
>>>>> } 
>>>>> }, 
>>>>> "total": { 
>>>>> "items": 178515204, 
>>>>> "bytes": 2160630547 
>>>>> } 
>>>>> } 
>>>>> } 
>>>>> 
>>>>> ----- Mail original ----- 
>>>>> De: "aderumier" <aderumier@xxxxxxxxx> 
>>>>> À: "Igor Fedotov" <ifedotov@xxxxxxx> 
>>>>> Cc: "Stefan Priebe, Profihost AG" <s.priebe@xxxxxxxxxxxx>, "Mark 
>>> Nelson" <mnelson@xxxxxxxxxx>, "Sage Weil" <sage@xxxxxxxxxxxx>, 
>>> "ceph-users" <ceph-users@xxxxxxxxxxxxxx>, "ceph-devel" 
>>> <ceph-devel@xxxxxxxxxxxxxxx> 
>>>>> Envoyé: Vendredi 8 Février 2019 16:14:54 
>>>>> Objet: Re:  ceph osd commit latency increase over time, 
>>> until restart 
>>>>> I'm just seeing 
>>>>> 
>>>>> StupidAllocator::_aligned_len 
>>>>> and 
>>>>> 
>>> btree::btree_iterator<btree::btree_node<btree::btree_map_params<unsigned 
>>> long, unsigned long, std::less<unsigned long>, mempoo 
>>>>> on 1 osd, both 10%. 
>>>>> 
>>>>> here the dump_mempools 
>>>>> 
>>>>> { 
>>>>> "mempool": { 
>>>>> "by_pool": { 
>>>>> "bloom_filter": { 
>>>>> "items": 0, 
>>>>> "bytes": 0 
>>>>> }, 
>>>>> "bluestore_alloc": { 
>>>>> "items": 210243456, 
>>>>> "bytes": 210243456 
>>>>> }, 
>>>>> "bluestore_cache_data": { 
>>>>> "items": 54, 
>>>>> "bytes": 643072 
>>>>> }, 
>>>>> "bluestore_cache_onode": { 
>>>>> "items": 105637, 
>>>>> "bytes": 70988064 
>>>>> }, 
>>>>> "bluestore_cache_other": { 
>>>>> "items": 48661920, 
>>>>> "bytes": 1539544228 
>>>>> }, 
>>>>> "bluestore_fsck": { 
>>>>> "items": 0, 
>>>>> "bytes": 0 
>>>>> }, 
>>>>> "bluestore_txc": { 
>>>>> "items": 12, 
>>>>> "bytes": 8928 
>>>>> }, 
>>>>> "bluestore_writing_deferred": { 
>>>>> "items": 406, 
>>>>> "bytes": 4792868 
>>>>> }, 
>>>>> "bluestore_writing": { 
>>>>> "items": 66, 
>>>>> "bytes": 1085440 
>>>>> }, 
>>>>> "bluefs": { 
>>>>> "items": 1882, 
>>>>> "bytes": 93600 
>>>>> }, 
>>>>> "buffer_anon": { 
>>>>> "items": 138986, 
>>>>> "bytes": 24983701 
>>>>> }, 
>>>>> "buffer_meta": { 
>>>>> "items": 544, 
>>>>> "bytes": 34816 
>>>>> }, 
>>>>> "osd": { 
>>>>> "items": 243, 
>>>>> "bytes": 3089016 
>>>>> }, 
>>>>> "osd_mapbl": { 
>>>>> "items": 36, 
>>>>> "bytes": 179308 
>>>>> }, 
>>>>> "osd_pglog": { 
>>>>> "items": 952564, 
>>>>> "bytes": 372459684 
>>>>> }, 
>>>>> "osdmap": { 
>>>>> "items": 3639, 
>>>>> "bytes": 224664 
>>>>> }, 
>>>>> "osdmap_mapping": { 
>>>>> "items": 0, 
>>>>> "bytes": 0 
>>>>> }, 
>>>>> "pgmap": { 
>>>>> "items": 0, 
>>>>> "bytes": 0 
>>>>> }, 
>>>>> "mds_co": { 
>>>>> "items": 0, 
>>>>> "bytes": 0 
>>>>> }, 
>>>>> "unittest_1": { 
>>>>> "items": 0, 
>>>>> "bytes": 0 
>>>>> }, 
>>>>> "unittest_2": { 
>>>>> "items": 0, 
>>>>> "bytes": 0 
>>>>> } 
>>>>> }, 
>>>>> "total": { 
>>>>> "items": 260109445, 
>>>>> "bytes": 2228370845 
>>>>> } 
>>>>> } 
>>>>> } 
>>>>> 
>>>>> 
>>>>> and the perf dump 
>>>>> 
>>>>> root@ceph5-2:~# ceph daemon osd.4 perf dump 
>>>>> { 
>>>>> "AsyncMessenger::Worker-0": { 
>>>>> "msgr_recv_messages": 22948570, 
>>>>> "msgr_send_messages": 22561570, 
>>>>> "msgr_recv_bytes": 333085080271, 
>>>>> "msgr_send_bytes": 261798871204, 
>>>>> "msgr_created_connections": 6152, 
>>>>> "msgr_active_connections": 2701, 
>>>>> "msgr_running_total_time": 1055.197867330, 
>>>>> "msgr_running_send_time": 352.764480121, 
>>>>> "msgr_running_recv_time": 499.206831955, 
>>>>> "msgr_running_fast_dispatch_time": 130.982201607 
>>>>> }, 
>>>>> "AsyncMessenger::Worker-1": { 
>>>>> "msgr_recv_messages": 18801593, 
>>>>> "msgr_send_messages": 18430264, 
>>>>> "msgr_recv_bytes": 306871760934, 
>>>>> "msgr_send_bytes": 192789048666, 
>>>>> "msgr_created_connections": 5773, 
>>>>> "msgr_active_connections": 2721, 
>>>>> "msgr_running_total_time": 816.821076305, 
>>>>> "msgr_running_send_time": 261.353228926, 
>>>>> "msgr_running_recv_time": 394.035587911, 
>>>>> "msgr_running_fast_dispatch_time": 104.012155720 
>>>>> }, 
>>>>> "AsyncMessenger::Worker-2": { 
>>>>> "msgr_recv_messages": 18463400, 
>>>>> "msgr_send_messages": 18105856, 
>>>>> "msgr_recv_bytes": 187425453590, 
>>>>> "msgr_send_bytes": 220735102555, 
>>>>> "msgr_created_connections": 5897, 
>>>>> "msgr_active_connections": 2605, 
>>>>> "msgr_running_total_time": 807.186854324, 
>>>>> "msgr_running_send_time": 296.834435839, 
>>>>> "msgr_running_recv_time": 351.364389691, 
>>>>> "msgr_running_fast_dispatch_time": 101.215776792 
>>>>> }, 
>>>>> "bluefs": { 
>>>>> "gift_bytes": 0, 
>>>>> "reclaim_bytes": 0, 
>>>>> "db_total_bytes": 256050724864, 
>>>>> "db_used_bytes": 12413042688, 
>>>>> "wal_total_bytes": 0, 
>>>>> "wal_used_bytes": 0, 
>>>>> "slow_total_bytes": 0, 
>>>>> "slow_used_bytes": 0, 
>>>>> "num_files": 209, 
>>>>> "log_bytes": 10383360, 
>>>>> "log_compactions": 14, 
>>>>> "logged_bytes": 336498688, 
>>>>> "files_written_wal": 2, 
>>>>> "files_written_sst": 4499, 
>>>>> "bytes_written_wal": 417989099783, 
>>>>> "bytes_written_sst": 213188750209 
>>>>> }, 
>>>>> "bluestore": { 
>>>>> "kv_flush_lat": { 
>>>>> "avgcount": 26371957, 
>>>>> "sum": 26.734038497, 
>>>>> "avgtime": 0.000001013 
>>>>> }, 
>>>>> "kv_commit_lat": { 
>>>>> "avgcount": 26371957, 
>>>>> "sum": 3397.491150603, 
>>>>> "avgtime": 0.000128829 
>>>>> }, 
>>>>> "kv_lat": { 
>>>>> "avgcount": 26371957, 
>>>>> "sum": 3424.225189100, 
>>>>> "avgtime": 0.000129843 
>>>>> }, 
>>>>> "state_prepare_lat": { 
>>>>> "avgcount": 30484924, 
>>>>> "sum": 3689.542105337, 
>>>>> "avgtime": 0.000121028 
>>>>> }, 
>>>>> "state_aio_wait_lat": { 
>>>>> "avgcount": 30484924, 
>>>>> "sum": 509.864546111, 
>>>>> "avgtime": 0.000016725 
>>>>> }, 
>>>>> "state_io_done_lat": { 
>>>>> "avgcount": 30484924, 
>>>>> "sum": 24.534052953, 
>>>>> "avgtime": 0.000000804 
>>>>> }, 
>>>>> "state_kv_queued_lat": { 
>>>>> "avgcount": 30484924, 
>>>>> "sum": 3488.338424238, 
>>>>> "avgtime": 0.000114428 
>>>>> }, 
>>>>> "state_kv_commiting_lat": { 
>>>>> "avgcount": 30484924, 
>>>>> "sum": 5660.437003432, 
>>>>> "avgtime": 0.000185679 
>>>>> }, 
>>>>> "state_kv_done_lat": { 
>>>>> "avgcount": 30484924, 
>>>>> "sum": 7.763511500, 
>>>>> "avgtime": 0.000000254 
>>>>> }, 
>>>>> "state_deferred_queued_lat": { 
>>>>> "avgcount": 26346134, 
>>>>> "sum": 666071.296856696, 
>>>>> "avgtime": 0.025281557 
>>>>> }, 
>>>>> "state_deferred_aio_wait_lat": { 
>>>>> "avgcount": 26346134, 
>>>>> "sum": 1755.660547071, 
>>>>> "avgtime": 0.000066638 
>>>>> }, 
>>>>> "state_deferred_cleanup_lat": { 
>>>>> "avgcount": 26346134, 
>>>>> "sum": 185465.151653703, 
>>>>> "avgtime": 0.007039558 
>>>>> }, 
>>>>> "state_finishing_lat": { 
>>>>> "avgcount": 30484920, 
>>>>> "sum": 3.046847481, 
>>>>> "avgtime": 0.000000099 
>>>>> }, 
>>>>> "state_done_lat": { 
>>>>> "avgcount": 30484920, 
>>>>> "sum": 13193.362685280, 
>>>>> "avgtime": 0.000432783 
>>>>> }, 
>>>>> "throttle_lat": { 
>>>>> "avgcount": 30484924, 
>>>>> "sum": 14.634269979, 
>>>>> "avgtime": 0.000000480 
>>>>> }, 
>>>>> "submit_lat": { 
>>>>> "avgcount": 30484924, 
>>>>> "sum": 3873.883076148, 
>>>>> "avgtime": 0.000127075 
>>>>> }, 
>>>>> "commit_lat": { 
>>>>> "avgcount": 30484924, 
>>>>> "sum": 13376.492317331, 
>>>>> "avgtime": 0.000438790 
>>>>> }, 
>>>>> "read_lat": { 
>>>>> "avgcount": 5873923, 
>>>>> "sum": 1817.167582057, 
>>>>> "avgtime": 0.000309361 
>>>>> }, 
>>>>> "read_onode_meta_lat": { 
>>>>> "avgcount": 19608201, 
>>>>> "sum": 146.770464482, 
>>>>> "avgtime": 0.000007485 
>>>>> }, 
>>>>> "read_wait_aio_lat": { 
>>>>> "avgcount": 13734278, 
>>>>> "sum": 2532.578077242, 
>>>>> "avgtime": 0.000184398 
>>>>> }, 
>>>>> "compress_lat": { 
>>>>> "avgcount": 0, 
>>>>> "sum": 0.000000000, 
>>>>> "avgtime": 0.000000000 
>>>>> }, 
>>>>> "decompress_lat": { 
>>>>> "avgcount": 1346945, 
>>>>> "sum": 26.227575896, 
>>>>> "avgtime": 0.000019471 
>>>>> }, 
>>>>> "csum_lat": { 
>>>>> "avgcount": 28020392, 
>>>>> "sum": 149.587819041, 
>>>>> "avgtime": 0.000005338 
>>>>> }, 
>>>>> "compress_success_count": 0, 
>>>>> "compress_rejected_count": 0, 
>>>>> "write_pad_bytes": 352923605, 
>>>>> "deferred_write_ops": 24373340, 
>>>>> "deferred_write_bytes": 216791842816, 
>>>>> "write_penalty_read_ops": 8062366, 
>>>>> "bluestore_allocated": 3765566013440, 
>>>>> "bluestore_stored": 4186255221852, 
>>>>> "bluestore_compressed": 39981379040, 
>>>>> "bluestore_compressed_allocated": 73748348928, 
>>>>> "bluestore_compressed_original": 165041381376, 
>>>>> "bluestore_onodes": 104232, 
>>>>> "bluestore_onode_hits": 71206874, 
>>>>> "bluestore_onode_misses": 1217914, 
>>>>> "bluestore_onode_shard_hits": 260183292, 
>>>>> "bluestore_onode_shard_misses": 22851573, 
>>>>> "bluestore_extents": 3394513, 
>>>>> "bluestore_blobs": 2773587, 
>>>>> "bluestore_buffers": 0, 
>>>>> "bluestore_buffer_bytes": 0, 
>>>>> "bluestore_buffer_hit_bytes": 62026011221, 
>>>>> "bluestore_buffer_miss_bytes": 995233669922, 
>>>>> "bluestore_write_big": 5648815, 
>>>>> "bluestore_write_big_bytes": 552502214656, 
>>>>> "bluestore_write_big_blobs": 12440992, 
>>>>> "bluestore_write_small": 35883770, 
>>>>> "bluestore_write_small_bytes": 223436965719, 
>>>>> "bluestore_write_small_unused": 408125, 
>>>>> "bluestore_write_small_deferred": 34961455, 
>>>>> "bluestore_write_small_pre_read": 34961455, 
>>>>> "bluestore_write_small_new": 514190, 
>>>>> "bluestore_txc": 30484924, 
>>>>> "bluestore_onode_reshard": 5144189, 
>>>>> "bluestore_blob_split": 60104, 
>>>>> "bluestore_extent_compress": 53347252, 
>>>>> "bluestore_gc_merged": 21142528, 
>>>>> "bluestore_read_eio": 0, 
>>>>> "bluestore_fragmentation_micros": 67 
>>>>> }, 
>>>>> "finisher-defered_finisher": { 
>>>>> "queue_len": 0, 
>>>>> "complete_latency": { 
>>>>> "avgcount": 0, 
>>>>> "sum": 0.000000000, 
>>>>> "avgtime": 0.000000000 
>>>>> } 
>>>>> }, 
>>>>> "finisher-finisher-0": { 
>>>>> "queue_len": 0, 
>>>>> "complete_latency": { 
>>>>> "avgcount": 26625163, 
>>>>> "sum": 1057.506990951, 
>>>>> "avgtime": 0.000039718 
>>>>> } 
>>>>> }, 
>>>>> "finisher-objecter-finisher-0": { 
>>>>> "queue_len": 0, 
>>>>> "complete_latency": { 
>>>>> "avgcount": 0, 
>>>>> "sum": 0.000000000, 
>>>>> "avgtime": 0.000000000 
>>>>> } 
>>>>> }, 
>>>>> "mutex-OSDShard.0::sdata_wait_lock": { 
>>>>> "wait": { 
>>>>> "avgcount": 0, 
>>>>> "sum": 0.000000000, 
>>>>> "avgtime": 0.000000000 
>>>>> } 
>>>>> }, 
>>>>> "mutex-OSDShard.0::shard_lock": { 
>>>>> "wait": { 
>>>>> "avgcount": 0, 
>>>>> "sum": 0.000000000, 
>>>>> "avgtime": 0.000000000 
>>>>> } 
>>>>> }, 
>>>>> "mutex-OSDShard.1::sdata_wait_lock": { 
>>>>> "wait": { 
>>>>> "avgcount": 0, 
>>>>> "sum": 0.000000000, 
>>>>> "avgtime": 0.000000000 
>>>>> } 
>>>>> }, 
>>>>> "mutex-OSDShard.1::shard_lock": { 
>>>>> "wait": { 
>>>>> "avgcount": 0, 
>>>>> "sum": 0.000000000, 
>>>>> "avgtime": 0.000000000 
>>>>> } 
>>>>> }, 
>>>>> "mutex-OSDShard.2::sdata_wait_lock": { 
>>>>> "wait": { 
>>>>> "avgcount": 0, 
>>>>> "sum": 0.000000000, 
>>>>> "avgtime": 0.000000000 
>>>>> } 
>>>>> }, 
>>>>> "mutex-OSDShard.2::shard_lock": { 
>>>>> "wait": { 
>>>>> "avgcount": 0, 
>>>>> "sum": 0.000000000, 
>>>>> "avgtime": 0.000000000 
>>>>> } 
>>>>> }, 
>>>>> "mutex-OSDShard.3::sdata_wait_lock": { 
>>>>> "wait": { 
>>>>> "avgcount": 0, 
>>>>> "sum": 0.000000000, 
>>>>> "avgtime": 0.000000000 
>>>>> } 
>>>>> }, 
>>>>> "mutex-OSDShard.3::shard_lock": { 
>>>>> "wait": { 
>>>>> "avgcount": 0, 
>>>>> "sum": 0.000000000, 
>>>>> "avgtime": 0.000000000 
>>>>> } 
>>>>> }, 
>>>>> "mutex-OSDShard.4::sdata_wait_lock": { 
>>>>> "wait": { 
>>>>> "avgcount": 0, 
>>>>> "sum": 0.000000000, 
>>>>> "avgtime": 0.000000000 
>>>>> } 
>>>>> }, 
>>>>> "mutex-OSDShard.4::shard_lock": { 
>>>>> "wait": { 
>>>>> "avgcount": 0, 
>>>>> "sum": 0.000000000, 
>>>>> "avgtime": 0.000000000 
>>>>> } 
>>>>> }, 
>>>>> "mutex-OSDShard.5::sdata_wait_lock": { 
>>>>> "wait": { 
>>>>> "avgcount": 0, 
>>>>> "sum": 0.000000000, 
>>>>> "avgtime": 0.000000000 
>>>>> } 
>>>>> }, 
>>>>> "mutex-OSDShard.5::shard_lock": { 
>>>>> "wait": { 
>>>>> "avgcount": 0, 
>>>>> "sum": 0.000000000, 
>>>>> "avgtime": 0.000000000 
>>>>> } 
>>>>> }, 
>>>>> "mutex-OSDShard.6::sdata_wait_lock": { 
>>>>> "wait": { 
>>>>> "avgcount": 0, 
>>>>> "sum": 0.000000000, 
>>>>> "avgtime": 0.000000000 
>>>>> } 
>>>>> }, 
>>>>> "mutex-OSDShard.6::shard_lock": { 
>>>>> "wait": { 
>>>>> "avgcount": 0, 
>>>>> "sum": 0.000000000, 
>>>>> "avgtime": 0.000000000 
>>>>> } 
>>>>> }, 
>>>>> "mutex-OSDShard.7::sdata_wait_lock": { 
>>>>> "wait": { 
>>>>> "avgcount": 0, 
>>>>> "sum": 0.000000000, 
>>>>> "avgtime": 0.000000000 
>>>>> } 
>>>>> }, 
>>>>> "mutex-OSDShard.7::shard_lock": { 
>>>>> "wait": { 
>>>>> "avgcount": 0, 
>>>>> "sum": 0.000000000, 
>>>>> "avgtime": 0.000000000 
>>>>> } 
>>>>> }, 
>>>>> "objecter": { 
>>>>> "op_active": 0, 
>>>>> "op_laggy": 0, 
>>>>> "op_send": 0, 
>>>>> "op_send_bytes": 0, 
>>>>> "op_resend": 0, 
>>>>> "op_reply": 0, 
>>>>> "op": 0, 
>>>>> "op_r": 0, 
>>>>> "op_w": 0, 
>>>>> "op_rmw": 0, 
>>>>> "op_pg": 0, 
>>>>> "osdop_stat": 0, 
>>>>> "osdop_create": 0, 
>>>>> "osdop_read": 0, 
>>>>> "osdop_write": 0, 
>>>>> "osdop_writefull": 0, 
>>>>> "osdop_writesame": 0, 
>>>>> "osdop_append": 0, 
>>>>> "osdop_zero": 0, 
>>>>> "osdop_truncate": 0, 
>>>>> "osdop_delete": 0, 
>>>>> "osdop_mapext": 0, 
>>>>> "osdop_sparse_read": 0, 
>>>>> "osdop_clonerange": 0, 
>>>>> "osdop_getxattr": 0, 
>>>>> "osdop_setxattr": 0, 
>>>>> "osdop_cmpxattr": 0, 
>>>>> "osdop_rmxattr": 0, 
>>>>> "osdop_resetxattrs": 0, 
>>>>> "osdop_tmap_up": 0, 
>>>>> "osdop_tmap_put": 0, 
>>>>> "osdop_tmap_get": 0, 
>>>>> "osdop_call": 0, 
>>>>> "osdop_watch": 0, 
>>>>> "osdop_notify": 0, 
>>>>> "osdop_src_cmpxattr": 0, 
>>>>> "osdop_pgls": 0, 
>>>>> "osdop_pgls_filter": 0, 
>>>>> "osdop_other": 0, 
>>>>> "linger_active": 0, 
>>>>> "linger_send": 0, 
>>>>> "linger_resend": 0, 
>>>>> "linger_ping": 0, 
>>>>> "poolop_active": 0, 
>>>>> "poolop_send": 0, 
>>>>> "poolop_resend": 0, 
>>>>> "poolstat_active": 0, 
>>>>> "poolstat_send": 0, 
>>>>> "poolstat_resend": 0, 
>>>>> "statfs_active": 0, 
>>>>> "statfs_send": 0, 
>>>>> "statfs_resend": 0, 
>>>>> "command_active": 0, 
>>>>> "command_send": 0, 
>>>>> "command_resend": 0, 
>>>>> "map_epoch": 105913, 
>>>>> "map_full": 0, 
>>>>> "map_inc": 828, 
>>>>> "osd_sessions": 0, 
>>>>> "osd_session_open": 0, 
>>>>> "osd_session_close": 0, 
>>>>> "osd_laggy": 0, 
>>>>> "omap_wr": 0, 
>>>>> "omap_rd": 0, 
>>>>> "omap_del": 0 
>>>>> }, 
>>>>> "osd": { 
>>>>> "op_wip": 0, 
>>>>> "op": 16758102, 
>>>>> "op_in_bytes": 238398820586, 
>>>>> "op_out_bytes": 165484999463, 
>>>>> "op_latency": { 
>>>>> "avgcount": 16758102, 
>>>>> "sum": 38242.481640842, 
>>>>> "avgtime": 0.002282029 
>>>>> }, 
>>>>> "op_process_latency": { 
>>>>> "avgcount": 16758102, 
>>>>> "sum": 28644.906310687, 
>>>>> "avgtime": 0.001709316 
>>>>> }, 
>>>>> "op_prepare_latency": { 
>>>>> "avgcount": 16761367, 
>>>>> "sum": 3489.856599934, 
>>>>> "avgtime": 0.000208208 
>>>>> }, 
>>>>> "op_r": 6188565, 
>>>>> "op_r_out_bytes": 165484999463, 
>>>>> "op_r_latency": { 
>>>>> "avgcount": 6188565, 
>>>>> "sum": 4507.365756792, 
>>>>> "avgtime": 0.000728337 
>>>>> }, 
>>>>> "op_r_process_latency": { 
>>>>> "avgcount": 6188565, 
>>>>> "sum": 942.363063429, 
>>>>> "avgtime": 0.000152274 
>>>>> }, 
>>>>> "op_r_prepare_latency": { 
>>>>> "avgcount": 6188644, 
>>>>> "sum": 982.866710389, 
>>>>> "avgtime": 0.000158817 
>>>>> }, 
>>>>> "op_w": 10546037, 
>>>>> "op_w_in_bytes": 238334329494, 
>>>>> "op_w_latency": { 
>>>>> "avgcount": 10546037, 
>>>>> "sum": 33160.719998316, 
>>>>> "avgtime": 0.003144377 
>>>>> }, 
>>>>> "op_w_process_latency": { 
>>>>> "avgcount": 10546037, 
>>>>> "sum": 27668.702029030, 
>>>>> "avgtime": 0.002623611 
>>>>> }, 
>>>>> "op_w_prepare_latency": { 
>>>>> "avgcount": 10548652, 
>>>>> "sum": 2499.688609173, 
>>>>> "avgtime": 0.000236967 
>>>>> }, 
>>>>> "op_rw": 23500, 
>>>>> "op_rw_in_bytes": 64491092, 
>>>>> "op_rw_out_bytes": 0, 
>>>>> "op_rw_latency": { 
>>>>> "avgcount": 23500, 
>>>>> "sum": 574.395885734, 
>>>>> "avgtime": 0.024442378 
>>>>> }, 
>>>>> "op_rw_process_latency": { 
>>>>> "avgcount": 23500, 
>>>>> "sum": 33.841218228, 
>>>>> "avgtime": 0.001440051 
>>>>> }, 
>>>>> "op_rw_prepare_latency": { 
>>>>> "avgcount": 24071, 
>>>>> "sum": 7.301280372, 
>>>>> "avgtime": 0.000303322 
>>>>> }, 
>>>>> "op_before_queue_op_lat": { 
>>>>> "avgcount": 57892986, 
>>>>> "sum": 1502.117718889, 
>>>>> "avgtime": 0.000025946 
>>>>> }, 
>>>>> "op_before_dequeue_op_lat": { 
>>>>> "avgcount": 58091683, 
>>>>> "sum": 45194.453254037, 
>>>>> "avgtime": 0.000777984 
>>>>> }, 
>>>>> "subop": 19784758, 
>>>>> "subop_in_bytes": 547174969754, 
>>>>> "subop_latency": { 
>>>>> "avgcount": 19784758, 
>>>>> "sum": 13019.714424060, 
>>>>> "avgtime": 0.000658067 
>>>>> }, 
>>>>> "subop_w": 19784758, 
>>>>> "subop_w_in_bytes": 547174969754, 
>>>>> "subop_w_latency": { 
>>>>> "avgcount": 19784758, 
>>>>> "sum": 13019.714424060, 
>>>>> "avgtime": 0.000658067 
>>>>> }, 
>>>>> "subop_pull": 0, 
>>>>> "subop_pull_latency": { 
>>>>> "avgcount": 0, 
>>>>> "sum": 0.000000000, 
>>>>> "avgtime": 0.000000000 
>>>>> }, 
>>>>> "subop_push": 0, 
>>>>> "subop_push_in_bytes": 0, 
>>>>> "subop_push_latency": { 
>>>>> "avgcount": 0, 
>>>>> "sum": 0.000000000, 
>>>>> "avgtime": 0.000000000 
>>>>> }, 
>>>>> "pull": 0, 
>>>>> "push": 2003, 
>>>>> "push_out_bytes": 5560009728, 
>>>>> "recovery_ops": 1940, 
>>>>> "loadavg": 118, 
>>>>> "buffer_bytes": 0, 
>>>>> "history_alloc_Mbytes": 0, 
>>>>> "history_alloc_num": 0, 
>>>>> "cached_crc": 0, 
>>>>> "cached_crc_adjusted": 0, 
>>>>> "missed_crc": 0, 
>>>>> "numpg": 243, 
>>>>> "numpg_primary": 82, 
>>>>> "numpg_replica": 161, 
>>>>> "numpg_stray": 0, 
>>>>> "numpg_removing": 0, 
>>>>> "heartbeat_to_peers": 10, 
>>>>> "map_messages": 7013, 
>>>>> "map_message_epochs": 7143, 
>>>>> "map_message_epoch_dups": 6315, 
>>>>> "messages_delayed_for_map": 0, 
>>>>> "osd_map_cache_hit": 203309, 
>>>>> "osd_map_cache_miss": 33, 
>>>>> "osd_map_cache_miss_low": 0, 
>>>>> "osd_map_cache_miss_low_avg": { 
>>>>> "avgcount": 0, 
>>>>> "sum": 0 
>>>>> }, 
>>>>> "osd_map_bl_cache_hit": 47012, 
>>>>> "osd_map_bl_cache_miss": 1681, 
>>>>> "stat_bytes": 6401248198656, 
>>>>> "stat_bytes_used": 3777979072512, 
>>>>> "stat_bytes_avail": 2623269126144, 
>>>>> "copyfrom": 0, 
>>>>> "tier_promote": 0, 
>>>>> "tier_flush": 0, 
>>>>> "tier_flush_fail": 0, 
>>>>> "tier_try_flush": 0, 
>>>>> "tier_try_flush_fail": 0, 
>>>>> "tier_evict": 0, 
>>>>> "tier_whiteout": 1631, 
>>>>> "tier_dirty": 22360, 
>>>>> "tier_clean": 0, 
>>>>> "tier_delay": 0, 
>>>>> "tier_proxy_read": 0, 
>>>>> "tier_proxy_write": 0, 
>>>>> "agent_wake": 0, 
>>>>> "agent_skip": 0, 
>>>>> "agent_flush": 0, 
>>>>> "agent_evict": 0, 
>>>>> "object_ctx_cache_hit": 16311156, 
>>>>> "object_ctx_cache_total": 17426393, 
>>>>> "op_cache_hit": 0, 
>>>>> "osd_tier_flush_lat": { 
>>>>> "avgcount": 0, 
>>>>> "sum": 0.000000000, 
>>>>> "avgtime": 0.000000000 
>>>>> }, 
>>>>> "osd_tier_promote_lat": { 
>>>>> "avgcount": 0, 
>>>>> "sum": 0.000000000, 
>>>>> "avgtime": 0.000000000 
>>>>> }, 
>>>>> "osd_tier_r_lat": { 
>>>>> "avgcount": 0, 
>>>>> "sum": 0.000000000, 
>>>>> "avgtime": 0.000000000 
>>>>> }, 
>>>>> "osd_pg_info": 30483113, 
>>>>> "osd_pg_fastinfo": 29619885, 
>>>>> "osd_pg_biginfo": 81703 
>>>>> }, 
>>>>> "recoverystate_perf": { 
>>>>> "initial_latency": { 
>>>>> "avgcount": 243, 
>>>>> "sum": 6.869296500, 
>>>>> "avgtime": 0.028268709 
>>>>> }, 
>>>>> "started_latency": { 
>>>>> "avgcount": 1125, 
>>>>> "sum": 13551384.917335850, 
>>>>> "avgtime": 12045.675482076 
>>>>> }, 
>>>>> "reset_latency": { 
>>>>> "avgcount": 1368, 
>>>>> "sum": 1101.727799040, 
>>>>> "avgtime": 0.805356578 
>>>>> }, 
>>>>> "start_latency": { 
>>>>> "avgcount": 1368, 
>>>>> "sum": 0.002014799, 
>>>>> "avgtime": 0.000001472 
>>>>> }, 
>>>>> "primary_latency": { 
>>>>> "avgcount": 507, 
>>>>> "sum": 4575560.638823428, 
>>>>> "avgtime": 9024.774435549 
>>>>> }, 
>>>>> "peering_latency": { 
>>>>> "avgcount": 550, 
>>>>> "sum": 499.372283616, 
>>>>> "avgtime": 0.907949606 
>>>>> }, 
>>>>> "backfilling_latency": { 
>>>>> "avgcount": 0, 
>>>>> "sum": 0.000000000, 
>>>>> "avgtime": 0.000000000 
>>>>> }, 
>>>>> "waitremotebackfillreserved_latency": { 
>>>>> "avgcount": 0, 
>>>>> "sum": 0.000000000, 
>>>>> "avgtime": 0.000000000 
>>>>> }, 
>>>>> "waitlocalbackfillreserved_latency": { 
>>>>> "avgcount": 0, 
>>>>> "sum": 0.000000000, 
>>>>> "avgtime": 0.000000000 
>>>>> }, 
>>>>> "notbackfilling_latency": { 
>>>>> "avgcount": 0, 
>>>>> "sum": 0.000000000, 
>>>>> "avgtime": 0.000000000 
>>>>> }, 
>>>>> "repnotrecovering_latency": { 
>>>>> "avgcount": 1009, 
>>>>> "sum": 8975301.082274411, 
>>>>> "avgtime": 8895.243887288 
>>>>> }, 
>>>>> "repwaitrecoveryreserved_latency": { 
>>>>> "avgcount": 420, 
>>>>> "sum": 99.846056520, 
>>>>> "avgtime": 0.237728706 
>>>>> }, 
>>>>> "repwaitbackfillreserved_latency": { 
>>>>> "avgcount": 0, 
>>>>> "sum": 0.000000000, 
>>>>> "avgtime": 0.000000000 
>>>>> }, 
>>>>> "reprecovering_latency": { 
>>>>> "avgcount": 420, 
>>>>> "sum": 241.682764382, 
>>>>> "avgtime": 0.575435153 
>>>>> }, 
>>>>> "activating_latency": { 
>>>>> "avgcount": 507, 
>>>>> "sum": 16.893347339, 
>>>>> "avgtime": 0.033320211 
>>>>> }, 
>>>>> "waitlocalrecoveryreserved_latency": { 
>>>>> "avgcount": 199, 
>>>>> "sum": 672.335512769, 
>>>>> "avgtime": 3.378570415 
>>>>> }, 
>>>>> "waitremoterecoveryreserved_latency": { 
>>>>> "avgcount": 199, 
>>>>> "sum": 213.536439363, 
>>>>> "avgtime": 1.073047433 
>>>>> }, 
>>>>> "recovering_latency": { 
>>>>> "avgcount": 199, 
>>>>> "sum": 79.007696479, 
>>>>> "avgtime": 0.397023600 
>>>>> }, 
>>>>> "recovered_latency": { 
>>>>> "avgcount": 507, 
>>>>> "sum": 14.000732748, 
>>>>> "avgtime": 0.027614857 
>>>>> }, 
>>>>> "clean_latency": { 
>>>>> "avgcount": 395, 
>>>>> "sum": 4574325.900371083, 
>>>>> "avgtime": 11580.571899673 
>>>>> }, 
>>>>> "active_latency": { 
>>>>> "avgcount": 425, 
>>>>> "sum": 4575107.630123680, 
>>>>> "avgtime": 10764.959129702 
>>>>> }, 
>>>>> "replicaactive_latency": { 
>>>>> "avgcount": 589, 
>>>>> "sum": 8975184.499049954, 
>>>>> "avgtime": 15238.004242869 
>>>>> }, 
>>>>> "stray_latency": { 
>>>>> "avgcount": 818, 
>>>>> "sum": 800.729455666, 
>>>>> "avgtime": 0.978886865 
>>>>> }, 
>>>>> "getinfo_latency": { 
>>>>> "avgcount": 550, 
>>>>> "sum": 15.085667048, 
>>>>> "avgtime": 0.027428485 
>>>>> }, 
>>>>> "getlog_latency": { 
>>>>> "avgcount": 546, 
>>>>> "sum": 3.482175693, 
>>>>> "avgtime": 0.006377611 
>>>>> }, 
>>>>> "waitactingchange_latency": { 
>>>>> "avgcount": 39, 
>>>>> "sum": 35.444551284, 
>>>>> "avgtime": 0.908834648 
>>>>> }, 
>>>>> "incomplete_latency": { 
>>>>> "avgcount": 0, 
>>>>> "sum": 0.000000000, 
>>>>> "avgtime": 0.000000000 
>>>>> }, 
>>>>> "down_latency": { 
>>>>> "avgcount": 0, 
>>>>> "sum": 0.000000000, 
>>>>> "avgtime": 0.000000000 
>>>>> }, 
>>>>> "getmissing_latency": { 
>>>>> "avgcount": 507, 
>>>>> "sum": 6.702129624, 
>>>>> "avgtime": 0.013219190 
>>>>> }, 
>>>>> "waitupthru_latency": { 
>>>>> "avgcount": 507, 
>>>>> "sum": 474.098261727, 
>>>>> "avgtime": 0.935105052 
>>>>> }, 
>>>>> "notrecovering_latency": { 
>>>>> "avgcount": 0, 
>>>>> "sum": 0.000000000, 
>>>>> "avgtime": 0.000000000 
>>>>> } 
>>>>> }, 
>>>>> "rocksdb": { 
>>>>> "get": 28320977, 
>>>>> "submit_transaction": 30484924, 
>>>>> "submit_transaction_sync": 26371957, 
>>>>> "get_latency": { 
>>>>> "avgcount": 28320977, 
>>>>> "sum": 325.900908733, 
>>>>> "avgtime": 0.000011507 
>>>>> }, 
>>>>> "submit_latency": { 
>>>>> "avgcount": 30484924, 
>>>>> "sum": 1835.888692371, 
>>>>> "avgtime": 0.000060222 
>>>>> }, 
>>>>> "submit_sync_latency": { 
>>>>> "avgcount": 26371957, 
>>>>> "sum": 1431.555230628, 
>>>>> "avgtime": 0.000054283 
>>>>> }, 
>>>>> "compact": 0, 
>>>>> "compact_range": 0, 
>>>>> "compact_queue_merge": 0, 
>>>>> "compact_queue_len": 0, 
>>>>> "rocksdb_write_wal_time": { 
>>>>> "avgcount": 0, 
>>>>> "sum": 0.000000000, 
>>>>> "avgtime": 0.000000000 
>>>>> }, 
>>>>> "rocksdb_write_memtable_time": { 
>>>>> "avgcount": 0, 
>>>>> "sum": 0.000000000, 
>>>>> "avgtime": 0.000000000 
>>>>> }, 
>>>>> "rocksdb_write_delay_time": { 
>>>>> "avgcount": 0, 
>>>>> "sum": 0.000000000, 
>>>>> "avgtime": 0.000000000 
>>>>> }, 
>>>>> "rocksdb_write_pre_and_post_time": { 
>>>>> "avgcount": 0, 
>>>>> "sum": 0.000000000, 
>>>>> "avgtime": 0.000000000 
>>>>> } 
>>>>> } 
>>>>> } 
>>>>> 
>>>>> ----- Mail original ----- 
>>>>> De: "Igor Fedotov" <ifedotov@xxxxxxx> 
>>>>> À: "aderumier" <aderumier@xxxxxxxxx> 
>>>>> Cc: "Stefan Priebe, Profihost AG" <s.priebe@xxxxxxxxxxxx>, "Mark 
>>> Nelson" <mnelson@xxxxxxxxxx>, "Sage Weil" <sage@xxxxxxxxxxxx>, 
>>> "ceph-users" <ceph-users@xxxxxxxxxxxxxx>, "ceph-devel" 
>>> <ceph-devel@xxxxxxxxxxxxxxx> 
>>>>> Envoyé: Mardi 5 Février 2019 18:56:51 
>>>>> Objet: Re:  ceph osd commit latency increase over time, 
>>> until restart 
>>>>> On 2/4/2019 6:40 PM, Alexandre DERUMIER wrote: 
>>>>>>>> but I don't see l_bluestore_fragmentation counter. 
>>>>>>>> (but I have bluestore_fragmentation_micros) 
>>>>>> ok, this is the same 
>>>>>> 
>>>>>> b.add_u64(l_bluestore_fragmentation, "bluestore_fragmentation_micros", 
>>>>>> "How fragmented bluestore free space is (free extents / max 
>>> possible number of free extents) * 1000"); 
>>>>>> 
>>>>>> Here a graph on last month, with bluestore_fragmentation_micros and 
>>> latency, 
>>>>>> http://odisoweb1.odiso.net/latency_vs_fragmentation_micros.png 
>>>>> hmm, so fragmentation grows eventually and drops on OSD restarts, isn't 
>>>>> it? The same for other OSDs? 
>>>>> 
>>>>> This proves some issue with the allocator - generally fragmentation 
>>>>> might grow but it shouldn't reset on restart. Looks like some intervals 
>>>>> aren't properly merged in run-time. 
>>>>> 
>>>>> On the other side I'm not completely sure that latency degradation is 
>>>>> caused by that - fragmentation growth is relatively small - I don't see 
>>>>> how this might impact performance that high. 
>>>>> 
>>>>> Wondering if you have OSD mempool monitoring (dump_mempools command 
>>>>> output on admin socket) reports? Do you have any historic data? 
>>>>> 
>>>>> If not may I have current output and say a couple more samples with 
>>>>> 8-12 hours interval? 
>>>>> 
>>>>> 
>>>>> Wrt to backporting bitmap allocator to mimic - we haven't had such 
>>> plans 
>>>>> before that but I'll discuss this at BlueStore meeting shortly. 
>>>>> 
>>>>> 
>>>>> Thanks, 
>>>>> 
>>>>> Igor 
>>>>> 
>>>>>> ----- Mail original ----- 
>>>>>> De: "Alexandre Derumier" <aderumier@xxxxxxxxx> 
>>>>>> À: "Igor Fedotov" <ifedotov@xxxxxxx> 
>>>>>> Cc: "Stefan Priebe, Profihost AG" <s.priebe@xxxxxxxxxxxx>, "Mark 
>>> Nelson" <mnelson@xxxxxxxxxx>, "Sage Weil" <sage@xxxxxxxxxxxx>, 
>>> "ceph-users" <ceph-users@xxxxxxxxxxxxxx>, "ceph-devel" 
>>> <ceph-devel@xxxxxxxxxxxxxxx> 
>>>>>> Envoyé: Lundi 4 Février 2019 16:04:38 
>>>>>> Objet: Re:  ceph osd commit latency increase over time, 
>>> until restart 
>>>>>> Thanks Igor, 
>>>>>> 
>>>>>>>> Could you please collect BlueStore performance counters right 
>>> after OSD 
>>>>>>>> startup and once you get high latency. 
>>>>>>>> 
>>>>>>>> Specifically 'l_bluestore_fragmentation' parameter is of interest. 
>>>>>> I'm already monitoring with 
>>>>>> "ceph daemon osd.x perf dump ", (I have 2months history will all 
>>> counters) 
>>>>>> but I don't see l_bluestore_fragmentation counter. 
>>>>>> 
>>>>>> (but I have bluestore_fragmentation_micros) 
>>>>>> 
>>>>>> 
>>>>>>>> Also if you're able to rebuild the code I can probably make a simple 
>>>>>>>> patch to track latency and some other internal allocator's 
>>> paramter to 
>>>>>>>> make sure it's degraded and learn more details. 
>>>>>> Sorry, It's a critical production cluster, I can't test on it :( 
>>>>>> But I have a test cluster, maybe I can try to put some load on it, 
>>> and try to reproduce. 
>>>>>> 
>>>>>> 
>>>>>>>> More vigorous fix would be to backport bitmap allocator from 
>>> Nautilus 
>>>>>>>> and try the difference... 
>>>>>> Any plan to backport it to mimic ? (But I can wait for Nautilus) 
>>>>>> perf results of new bitmap allocator seem very promising from what 
>>> I've seen in PR. 
>>>>>> 
>>>>>> 
>>>>>> ----- Mail original ----- 
>>>>>> De: "Igor Fedotov" <ifedotov@xxxxxxx> 
>>>>>> À: "Alexandre Derumier" <aderumier@xxxxxxxxx>, "Stefan Priebe, 
>>> Profihost AG" <s.priebe@xxxxxxxxxxxx>, "Mark Nelson" <mnelson@xxxxxxxxxx> 
>>>>>> Cc: "Sage Weil" <sage@xxxxxxxxxxxx>, "ceph-users" 
>>> <ceph-users@xxxxxxxxxxxxxx>, "ceph-devel" <ceph-devel@xxxxxxxxxxxxxxx> 
>>>>>> Envoyé: Lundi 4 Février 2019 15:51:30 
>>>>>> Objet: Re:  ceph osd commit latency increase over time, 
>>> until restart 
>>>>>> Hi Alexandre, 
>>>>>> 
>>>>>> looks like a bug in StupidAllocator. 
>>>>>> 
>>>>>> Could you please collect BlueStore performance counters right after 
>>> OSD 
>>>>>> startup and once you get high latency. 
>>>>>> 
>>>>>> Specifically 'l_bluestore_fragmentation' parameter is of interest. 
>>>>>> 
>>>>>> Also if you're able to rebuild the code I can probably make a simple 
>>>>>> patch to track latency and some other internal allocator's paramter to 
>>>>>> make sure it's degraded and learn more details. 
>>>>>> 
>>>>>> 
>>>>>> More vigorous fix would be to backport bitmap allocator from Nautilus 
>>>>>> and try the difference... 
>>>>>> 
>>>>>> 
>>>>>> Thanks, 
>>>>>> 
>>>>>> Igor 
>>>>>> 
>>>>>> 
>>>>>> On 2/4/2019 5:17 PM, Alexandre DERUMIER wrote: 
>>>>>>> Hi again, 
>>>>>>> 
>>>>>>> I speak too fast, the problem has occured again, so it's not 
>>> tcmalloc cache size related. 
>>>>>>> 
>>>>>>> I have notice something using a simple "perf top", 
>>>>>>> 
>>>>>>> each time I have this problem (I have seen exactly 4 times the 
>>> same behaviour), 
>>>>>>> when latency is bad, perf top give me : 
>>>>>>> 
>>>>>>> StupidAllocator::_aligned_len 
>>>>>>> and 
>>>>>>> 
>>> btree::btree_iterator<btree::btree_node<btree::btree_map_params<unsigned 
>>> long, unsigned long, std::less<unsigned long>, mempoo 
>>>>>>> l::pool_allocator<(mempool::pool_index_t)1, std::pair<unsigned 
>>> long const, unsigned long> >, 256> >, std::pair<unsigned long const, 
>>> unsigned long>&, std::pair<unsigned long 
>>>>>>> const, unsigned long>*>::increment_slow() 
>>>>>>> 
>>>>>>> (around 10-20% time for both) 
>>>>>>> 
>>>>>>> 
>>>>>>> when latency is good, I don't see them at all. 
>>>>>>> 
>>>>>>> 
>>>>>>> I have used the Mark wallclock profiler, here the results: 
>>>>>>> 
>>>>>>> http://odisoweb1.odiso.net/gdbpmp-ok.txt 
>>>>>>> 
>>>>>>> http://odisoweb1.odiso.net/gdbpmp-bad.txt 
>>>>>>> 
>>>>>>> 
>>>>>>> here an extract of the thread with btree::btree_iterator && 
>>> StupidAllocator::_aligned_len 
>>>>>>> 
>>>>>>> + 100.00% clone 
>>>>>>> + 100.00% start_thread 
>>>>>>> + 100.00% ShardedThreadPool::WorkThreadSharded::entry() 
>>>>>>> + 100.00% ShardedThreadPool::shardedthreadpool_worker(unsigned int) 
>>>>>>> + 100.00% OSD::ShardedOpWQ::_process(unsigned int, 
>>> ceph::heartbeat_handle_d*) 
>>>>>>> + 70.00% PGOpItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, 
>>> ThreadPool::TPHandle&) 
>>>>>>> | + 70.00% OSD::dequeue_op(boost::intrusive_ptr<PG>, 
>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&) 
>>>>>>> | + 70.00% 
>>> PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, 
>>> ThreadPool::TPHandle&) 
>>>>>>> | + 68.00% PGBackend::handle_message(boost::intrusive_ptr<OpRequest>) 
>>>>>>> | | + 68.00% 
>>> ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>) 
>>>>>>> | | + 68.00% 
>>> ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>) 
>>>>>>> | | + 67.00% non-virtual thunk to 
>>> PrimaryLogPG::queue_transactions(std::vector<ObjectStore::Transaction, 
>>> std::allocator<ObjectStore::Transaction> >&, 
>>> boost::intrusive_ptr<OpRequest>) 
>>>>>>> | | | + 67.00% 
>>> BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, 
>>> std::vector<ObjectStore::Transaction, 
>>> std::allocator<ObjectStore::Transaction> >&, 
>>> boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*) 
>>>>>>> | | | + 66.00% 
>>> BlueStore::_txc_add_transaction(BlueStore::TransContext*, 
>>> ObjectStore::Transaction*) 
>>>>>>> | | | | + 66.00% BlueStore::_write(BlueStore::TransContext*, 
>>> boost::intrusive_ptr<BlueStore::Collection>&, 
>>> boost::intrusive_ptr<BlueStore::Onode>&, unsigned long, unsigned long, 
>>> ceph::buffer::list&, unsigned int) 
>>>>>>> | | | | + 66.00% BlueStore::_do_write(BlueStore::TransContext*, 
>>> boost::intrusive_ptr<BlueStore::Collection>&, 
>>> boost::intrusive_ptr<BlueStore::Onode>, unsigned long, unsigned long, 
>>> ceph::buffer::list&, unsigned int) 
>>>>>>> | | | | + 65.00% 
>>> BlueStore::_do_alloc_write(BlueStore::TransContext*, 
>>> boost::intrusive_ptr<BlueStore::Collection>, 
>>> boost::intrusive_ptr<BlueStore::Onode>, BlueStore::WriteContext*) 
>>>>>>> | | | | | + 64.00% StupidAllocator::allocate(unsigned long, 
>>> unsigned long, unsigned long, long, std::vector<bluestore_pextent_t, 
>>> mempool::pool_allocator<(mempool::pool_index_t)4, bluestore_pextent_t> >*) 
>>>>>>> | | | | | | + 64.00% StupidAllocator::allocate_int(unsigned long, 
>>> unsigned long, long, unsigned long*, unsigned int*) 
>>>>>>> | | | | | | + 34.00% 
>>> btree::btree_iterator<btree::btree_node<btree::btree_map_params<unsigned 
>>> long, unsigned long, std::less<unsigned long>, 
>>> mempool::pool_allocator<(mempool::pool_index_t)1, std::pair<unsigned 
>>> long const, unsigned long> >, 256> >, std::pair<unsigned long const, 
>>> unsigned long>&, std::pair<unsigned long const, unsigned 
>>> long>*>::increment_slow() 
>>>>>>> | | | | | | + 26.00% 
>>> StupidAllocator::_aligned_len(interval_set<unsigned long, 
>>> btree::btree_map<unsigned long, unsigned long, std::less<unsigned long>, 
>>> mempool::pool_allocator<(mempool::pool_index_t)1, std::pair<unsigned 
>>> long const, unsigned long> >, 256> >::iterator, unsigned long) 
>>>>>>> 
>>>>>>> 
>>>>>>> ----- Mail original ----- 
>>>>>>> De: "Alexandre Derumier" <aderumier@xxxxxxxxx> 
>>>>>>> À: "Stefan Priebe, Profihost AG" <s.priebe@xxxxxxxxxxxx> 
>>>>>>> Cc: "Sage Weil" <sage@xxxxxxxxxxxx>, "ceph-users" 
>>> <ceph-users@xxxxxxxxxxxxxx>, "ceph-devel" <ceph-devel@xxxxxxxxxxxxxxx> 
>>>>>>> Envoyé: Lundi 4 Février 2019 09:38:11 
>>>>>>> Objet: Re:  ceph osd commit latency increase over 
>>> time, until restart 
>>>>>>> Hi, 
>>>>>>> 
>>>>>>> some news: 
>>>>>>> 
>>>>>>> I have tried with different transparent hugepage values (madvise, 
>>> never) : no change 
>>>>>>> I have tried to increase bluestore_cache_size_ssd to 8G: no change 
>>>>>>> 
>>>>>>> I have tried to increase TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES to 
>>> 256mb : it seem to help, after 24h I'm still around 1,5ms. (need to wait 
>>> some more days to be sure) 
>>>>>>> 
>>>>>>> Note that this behaviour seem to happen really faster (< 2 days) 
>>> on my big nvme drives (6TB), 
>>>>>>> my others clusters user 1,6TB ssd. 
>>>>>>> 
>>>>>>> Currently I'm using only 1 osd by nvme (I don't have more than 
>>> 5000iops by osd), but I'll try this week with 2osd by nvme, to see if 
>>> it's helping. 
>>>>>>> 
>>>>>>> BTW, does somebody have already tested ceph without tcmalloc, with 
>>> glibc >= 2.26 (which have also thread cache) ? 
>>>>>>> 
>>>>>>> Regards, 
>>>>>>> 
>>>>>>> Alexandre 
>>>>>>> 
>>>>>>> 
>>>>>>> ----- Mail original ----- 
>>>>>>> De: "aderumier" <aderumier@xxxxxxxxx> 
>>>>>>> À: "Stefan Priebe, Profihost AG" <s.priebe@xxxxxxxxxxxx> 
>>>>>>> Cc: "Sage Weil" <sage@xxxxxxxxxxxx>, "ceph-users" 
>>> <ceph-users@xxxxxxxxxxxxxx>, "ceph-devel" <ceph-devel@xxxxxxxxxxxxxxx> 
>>>>>>> Envoyé: Mercredi 30 Janvier 2019 19:58:15 
>>>>>>> Objet: Re:  ceph osd commit latency increase over 
>>> time, until restart 
>>>>>>>>> Thanks. Is there any reason you monitor op_w_latency but not 
>>>>>>>>> op_r_latency but instead op_latency? 
>>>>>>>>> 
>>>>>>>>> Also why do you monitor op_w_process_latency? but not 
>>> op_r_process_latency? 
>>>>>>> I monitor read too. (I have all metrics for osd sockets, and a lot 
>>> of graphs). 
>>>>>>> I just don't see latency difference on reads. (or they are very 
>>> very small vs the write latency increase) 
>>>>>>> 
>>>>>>> 
>>>>>>> ----- Mail original ----- 
>>>>>>> De: "Stefan Priebe, Profihost AG" <s.priebe@xxxxxxxxxxxx> 
>>>>>>> À: "aderumier" <aderumier@xxxxxxxxx> 
>>>>>>> Cc: "Sage Weil" <sage@xxxxxxxxxxxx>, "ceph-users" 
>>> <ceph-users@xxxxxxxxxxxxxx>, "ceph-devel" <ceph-devel@xxxxxxxxxxxxxxx> 
>>>>>>> Envoyé: Mercredi 30 Janvier 2019 19:50:20 
>>>>>>> Objet: Re:  ceph osd commit latency increase over 
>>> time, until restart 
>>>>>>> Hi, 
>>>>>>> 
>>>>>>> Am 30.01.19 um 14:59 schrieb Alexandre DERUMIER: 
>>>>>>>> Hi Stefan, 
>>>>>>>> 
>>>>>>>>>> currently i'm in the process of switching back from jemalloc to 
>>> tcmalloc 
>>>>>>>>>> like suggested. This report makes me a little nervous about my 
>>> change. 
>>>>>>>> Well,I'm really not sure that it's a tcmalloc bug. 
>>>>>>>> maybe bluestore related (don't have filestore anymore to compare) 
>>>>>>>> I need to compare with bigger latencies 
>>>>>>>> 
>>>>>>>> here an example, when all osd at 20-50ms before restart, then 
>>> after restart (at 21:15), 1ms 
>>>>>>>> http://odisoweb1.odiso.net/latencybad.png 
>>>>>>>> 
>>>>>>>> I observe the latency in my guest vm too, on disks iowait. 
>>>>>>>> 
>>>>>>>> http://odisoweb1.odiso.net/latencybadvm.png 
>>>>>>>> 
>>>>>>>>>> Also i'm currently only monitoring latency for filestore osds. 
>>> Which 
>>>>>>>>>> exact values out of the daemon do you use for bluestore? 
>>>>>>>> here my influxdb queries: 
>>>>>>>> 
>>>>>>>> It take op_latency.sum/op_latency.avgcount on last second. 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> SELECT non_negative_derivative(first("op_latency.sum"), 
>>> 1s)/non_negative_derivative(first("op_latency.avgcount"),1s) FROM "ceph" 
>>> WHERE "host" =~ /^([[host]])$/ AND "id" =~ /^([[osd]])$/ AND $timeFilter 
>>> GROUP BY time($interval), "host", "id" fill(previous) 
>>>>>>>> 
>>>>>>>> SELECT non_negative_derivative(first("op_w_latency.sum"), 
>>> 1s)/non_negative_derivative(first("op_w_latency.avgcount"),1s) FROM 
>>> "ceph" WHERE "host" =~ /^([[host]])$/ AND collection='osd' AND "id" =~ 
>>> /^([[osd]])$/ AND $timeFilter GROUP BY time($interval), "host", "id" 
>>> fill(previous) 
>>>>>>>> 
>>>>>>>> SELECT non_negative_derivative(first("op_w_process_latency.sum"), 
>>> 1s)/non_negative_derivative(first("op_w_process_latency.avgcount"),1s) 
>>> FROM "ceph" WHERE "host" =~ /^([[host]])$/ AND collection='osd' AND "id" 
>>> =~ /^([[osd]])$/ AND $timeFilter GROUP BY time($interval), "host", "id" 
>>> fill(previous) 
>>>>>>> Thanks. Is there any reason you monitor op_w_latency but not 
>>>>>>> op_r_latency but instead op_latency? 
>>>>>>> 
>>>>>>> Also why do you monitor op_w_process_latency? but not 
>>> op_r_process_latency? 
>>>>>>> greets, 
>>>>>>> Stefan 
>>>>>>> 
>>>>>>>> ----- Mail original ----- 
>>>>>>>> De: "Stefan Priebe, Profihost AG" <s.priebe@xxxxxxxxxxxx> 
>>>>>>>> À: "aderumier" <aderumier@xxxxxxxxx>, "Sage Weil" 
>>> <sage@xxxxxxxxxxxx> 
>>>>>>>> Cc: "ceph-users" <ceph-users@xxxxxxxxxxxxxx>, "ceph-devel" 
>>> <ceph-devel@xxxxxxxxxxxxxxx> 
>>>>>>>> Envoyé: Mercredi 30 Janvier 2019 08:45:33 
>>>>>>>> Objet: Re:  ceph osd commit latency increase over 
>>> time, until restart 
>>>>>>>> Hi, 
>>>>>>>> 
>>>>>>>> Am 30.01.19 um 08:33 schrieb Alexandre DERUMIER: 
>>>>>>>>> Hi, 
>>>>>>>>> 
>>>>>>>>> here some new results, 
>>>>>>>>> different osd/ different cluster 
>>>>>>>>> 
>>>>>>>>> before osd restart latency was between 2-5ms 
>>>>>>>>> after osd restart is around 1-1.5ms 
>>>>>>>>> 
>>>>>>>>> http://odisoweb1.odiso.net/cephperf2/bad.txt (2-5ms) 
>>>>>>>>> http://odisoweb1.odiso.net/cephperf2/ok.txt (1-1.5ms) 
>>>>>>>>> http://odisoweb1.odiso.net/cephperf2/diff.txt 
>>>>>>>>> 
>>>>>>>>> From what I see in diff, the biggest difference is in tcmalloc, 
>>> but maybe I'm wrong. 
>>>>>>>>> (I'm using tcmalloc 2.5-2.2) 
>>>>>>>> currently i'm in the process of switching back from jemalloc to 
>>> tcmalloc 
>>>>>>>> like suggested. This report makes me a little nervous about my 
>>> change. 
>>>>>>>> Also i'm currently only monitoring latency for filestore osds. Which 
>>>>>>>> exact values out of the daemon do you use for bluestore? 
>>>>>>>> 
>>>>>>>> I would like to check if i see the same behaviour. 
>>>>>>>> 
>>>>>>>> Greets, 
>>>>>>>> Stefan 
>>>>>>>> 
>>>>>>>>> ----- Mail original ----- 
>>>>>>>>> De: "Sage Weil" <sage@xxxxxxxxxxxx> 
>>>>>>>>> À: "aderumier" <aderumier@xxxxxxxxx> 
>>>>>>>>> Cc: "ceph-users" <ceph-users@xxxxxxxxxxxxxx>, "ceph-devel" 
>>> <ceph-devel@xxxxxxxxxxxxxxx> 
>>>>>>>>> Envoyé: Vendredi 25 Janvier 2019 10:49:02 
>>>>>>>>> Objet: Re: ceph osd commit latency increase over time, until 
>>> restart 
>>>>>>>>> Can you capture a perf top or perf record to see where teh CPU 
>>> time is 
>>>>>>>>> going on one of the OSDs wth a high latency? 
>>>>>>>>> 
>>>>>>>>> Thanks! 
>>>>>>>>> sage 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Fri, 25 Jan 2019, Alexandre DERUMIER wrote: 
>>>>>>>>> 
>>>>>>>>>> Hi, 
>>>>>>>>>> 
>>>>>>>>>> I have a strange behaviour of my osd, on multiple clusters, 
>>>>>>>>>> 
>>>>>>>>>> All cluster are running mimic 13.2.1,bluestore, with ssd or 
>>> nvme drivers, 
>>>>>>>>>> workload is rbd only, with qemu-kvm vms running with librbd + 
>>> snapshot/rbd export-diff/snapshotdelete each day for backup 
>>>>>>>>>> When the osd are refreshly started, the commit latency is 
>>> between 0,5-1ms. 
>>>>>>>>>> But overtime, this latency increase slowly (maybe around 1ms by 
>>> day), until reaching crazy 
>>>>>>>>>> values like 20-200ms. 
>>>>>>>>>> 
>>>>>>>>>> Some example graphs: 
>>>>>>>>>> 
>>>>>>>>>> http://odisoweb1.odiso.net/osdlatency1.png 
>>>>>>>>>> http://odisoweb1.odiso.net/osdlatency2.png 
>>>>>>>>>> 
>>>>>>>>>> All osds have this behaviour, in all clusters. 
>>>>>>>>>> 
>>>>>>>>>> The latency of physical disks is ok. (Clusters are far to be 
>>> full loaded) 
>>>>>>>>>> And if I restart the osd, the latency come back to 0,5-1ms. 
>>>>>>>>>> 
>>>>>>>>>> That's remember me old tcmalloc bug, but maybe could it be a 
>>> bluestore memory bug ? 
>>>>>>>>>> Any Hints for counters/logs to check ? 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Regards, 
>>>>>>>>>> 
>>>>>>>>>> Alexandre 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> _______________________________________________ 
>>>>>>>>> ceph-users mailing list 
>>>>>>>>> ceph-users@xxxxxxxxxxxxxx 
>>>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
>>>>>>>>> 
>>>> 
>>> On 2/13/2019 11:42 AM, Alexandre DERUMIER wrote: 
>>>> Hi Igor, 
>>>> 
>>>> Thanks again for helping ! 
>>>> 
>>>> 
>>>> 
>>>> I have upgrade to last mimic this weekend, and with new autotune memory, 
>>>> I have setup osd_memory_target to 8G. (my nvme are 6TB) 
>>>> 
>>>> 
>>>> I have done a lot of perf dump and mempool dump and ps of process to see rss memory at different hours, 
>>>> here the reports for osd.0: 
>>>> 
>>>> http://odisoweb1.odiso.net/perfanalysis/ 
>>>> 
>>>> 
>>>> osd has been started the 12-02-2019 at 08:00 
>>>> 
>>>> first report after 1h running 
>>>> http://odisoweb1.odiso.net/perfanalysis/osd.0.12-02-2018.09:30.perf.txt 
>>>> http://odisoweb1.odiso.net/perfanalysis/osd.0.12-02-2018.09:30.dump_mempools.txt 
>>>> http://odisoweb1.odiso.net/perfanalysis/osd.0.12-02-2018.09:30.ps.txt 
>>>> 
>>>> 
>>>> 
>>>> report after 24 before counter resets 
>>>> 
>>>> http://odisoweb1.odiso.net/perfanalysis/osd.0.13-02-2018.08:00.perf.txt 
>>>> http://odisoweb1.odiso.net/perfanalysis/osd.0.13-02-2018.08:00.dump_mempools.txt 
>>>> http://odisoweb1.odiso.net/perfanalysis/osd.0.13-02-2018.08:00.ps.txt 
>>>> 
>>>> report 1h after counter reset 
>>>> http://odisoweb1.odiso.net/perfanalysis/osd.0.13-02-2018.09:00.perf.txt 
>>>> http://odisoweb1.odiso.net/perfanalysis/osd.0.13-02-2018.09:00.dump_mempools.txt 
>>>> http://odisoweb1.odiso.net/perfanalysis/osd.0.13-02-2018.09:00.ps.txt 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> I'm seeing the bluestore buffer bytes memory increasing up to 4G around 12-02-2019 at 14:00 
>>>> http://odisoweb1.odiso.net/perfanalysis/graphs2.png 
>>>> Then after that, slowly decreasing. 
>>>> 
>>>> 
>>>> Another strange thing, 
>>>> I'm seeing total bytes at 5G at 12-02-2018.13:30 
>>>> http://odisoweb1.odiso.net/perfanalysis/osd.0.12-02-2018.13:30.dump_mempools.txt 
>>>> Then is decreasing over time (around 3,7G this morning), but RSS is still at 8G 
>>>> 
>>>> 
>>>> I'm graphing mempools counters too since yesterday, so I'll able to track them over time. 
>>>> 
>>>> ----- Mail original ----- 
>>>> De: "Igor Fedotov" <ifedotov@xxxxxxx> 
>>>> À: "Alexandre Derumier" <aderumier@xxxxxxxxx> 
>>>> Cc: "Sage Weil" <sage@xxxxxxxxxxxx>, "ceph-users" <ceph-users@xxxxxxxxxxxxxx>, "ceph-devel" <ceph-devel@xxxxxxxxxxxxxxx> 
>>>> Envoyé: Lundi 11 Février 2019 12:03:17 
>>>> Objet: Re:  ceph osd commit latency increase over time, until restart 
>>>> 
>>>> On 2/8/2019 6:57 PM, Alexandre DERUMIER wrote: 
>>>>> another mempool dump after 1h run. (latency ok) 
>>>>> 
>>>>> Biggest difference: 
>>>>> 
>>>>> before restart 
>>>>> ------------- 
>>>>> "bluestore_cache_other": { 
>>>>> "items": 48661920, 
>>>>> "bytes": 1539544228 
>>>>> }, 
>>>>> "bluestore_cache_data": { 
>>>>> "items": 54, 
>>>>> "bytes": 643072 
>>>>> }, 
>>>>> (other caches seem to be quite low too, like bluestore_cache_other take all the memory) 
>>>>> 
>>>>> 
>>>>> After restart 
>>>>> ------------- 
>>>>> "bluestore_cache_other": { 
>>>>> "items": 12432298, 
>>>>> "bytes": 500834899 
>>>>> }, 
>>>>> "bluestore_cache_data": { 
>>>>> "items": 40084, 
>>>>> "bytes": 1056235520 
>>>>> }, 
>>>>> 
>>>> This is fine as cache is warming after restart and some rebalancing 
>>>> between data and metadata might occur. 
>>>> 
>>>> What relates to allocator and most probably to fragmentation growth is : 
>>>> 
>>>> "bluestore_alloc": { 
>>>> "items": 165053952, 
>>>> "bytes": 165053952 
>>>> }, 
>>>> 
>>>> which had been higher before the reset (if I got these dumps' order 
>>>> properly) 
>>>> 
>>>> "bluestore_alloc": { 
>>>> "items": 210243456, 
>>>> "bytes": 210243456 
>>>> }, 
>>>> 
>>>> But as I mentioned - I'm not 100% sure this might cause such a huge 
>>>> latency increase... 
>>>> 
>>>> Do you have perf counters dump after the restart? 
>>>> 
>>>> Could you collect some more dumps - for both mempool and perf counters? 
>>>> 
>>>> So ideally I'd like to have: 
>>>> 
>>>> 1) mempool/perf counters dumps after the restart (1hour is OK) 
>>>> 
>>>> 2) mempool/perf counters dumps in 24+ hours after restart 
>>>> 
>>>> 3) reset perf counters after 2), wait for 1 hour (and without OSD 
>>>> restart) and dump mempool/perf counters again. 
>>>> 
>>>> So we'll be able to learn both allocator mem usage growth and operation 
>>>> latency distribution for the following periods: 
>>>> 
>>>> a) 1st hour after restart 
>>>> 
>>>> b) 25th hour. 
>>>> 
>>>> 
>>>> Thanks, 
>>>> 
>>>> Igor 
>>>> 
>>>> 
>>>>> full mempool dump after restart 
>>>>> ------------------------------- 
>>>>> 
>>>>> { 
>>>>> "mempool": { 
>>>>> "by_pool": { 
>>>>> "bloom_filter": { 
>>>>> "items": 0, 
>>>>> "bytes": 0 
>>>>> }, 
>>>>> "bluestore_alloc": { 
>>>>> "items": 165053952, 
>>>>> "bytes": 165053952 
>>>>> }, 
>>>>> "bluestore_cache_data": { 
>>>>> "items": 40084, 
>>>>> "bytes": 1056235520 
>>>>> }, 
>>>>> "bluestore_cache_onode": { 
>>>>> "items": 22225, 
>>>>> "bytes": 14935200 
>>>>> }, 
>>>>> "bluestore_cache_other": { 
>>>>> "items": 12432298, 
>>>>> "bytes": 500834899 
>>>>> }, 
>>>>> "bluestore_fsck": { 
>>>>> "items": 0, 
>>>>> "bytes": 0 
>>>>> }, 
>>>>> "bluestore_txc": { 
>>>>> "items": 11, 
>>>>> "bytes": 8184 
>>>>> }, 
>>>>> "bluestore_writing_deferred": { 
>>>>> "items": 5047, 
>>>>> "bytes": 22673736 
>>>>> }, 
>>>>> "bluestore_writing": { 
>>>>> "items": 91, 
>>>>> "bytes": 1662976 
>>>>> }, 
>>>>> "bluefs": { 
>>>>> "items": 1907, 
>>>>> "bytes": 95600 
>>>>> }, 
>>>>> "buffer_anon": { 
>>>>> "items": 19664, 
>>>>> "bytes": 25486050 
>>>>> }, 
>>>>> "buffer_meta": { 
>>>>> "items": 46189, 
>>>>> "bytes": 2956096 
>>>>> }, 
>>>>> "osd": { 
>>>>> "items": 243, 
>>>>> "bytes": 3089016 
>>>>> }, 
>>>>> "osd_mapbl": { 
>>>>> "items": 17, 
>>>>> "bytes": 214366 
>>>>> }, 
>>>>> "osd_pglog": { 
>>>>> "items": 889673, 
>>>>> "bytes": 367160400 
>>>>> }, 
>>>>> "osdmap": { 
>>>>> "items": 3803, 
>>>>> "bytes": 224552 
>>>>> }, 
>>>>> "osdmap_mapping": { 
>>>>> "items": 0, 
>>>>> "bytes": 0 
>>>>> }, 
>>>>> "pgmap": { 
>>>>> "items": 0, 
>>>>> "bytes": 0 
>>>>> }, 
>>>>> "mds_co": { 
>>>>> "items": 0, 
>>>>> "bytes": 0 
>>>>> }, 
>>>>> "unittest_1": { 
>>>>> "items": 0, 
>>>>> "bytes": 0 
>>>>> }, 
>>>>> "unittest_2": { 
>>>>> "items": 0, 
>>>>> "bytes": 0 
>>>>> } 
>>>>> }, 
>>>>> "total": { 
>>>>> "items": 178515204, 
>>>>> "bytes": 2160630547 
>>>>> } 
>>>>> } 
>>>>> } 
>>>>> 
>>>>> ----- Mail original ----- 
>>>>> De: "aderumier" <aderumier@xxxxxxxxx> 
>>>>> À: "Igor Fedotov" <ifedotov@xxxxxxx> 
>>>>> Cc: "Stefan Priebe, Profihost AG" <s.priebe@xxxxxxxxxxxx>, "Mark Nelson" <mnelson@xxxxxxxxxx>, "Sage Weil" <sage@xxxxxxxxxxxx>, "ceph-users" <ceph-users@xxxxxxxxxxxxxx>, "ceph-devel" <ceph-devel@xxxxxxxxxxxxxxx> 
>>>>> Envoyé: Vendredi 8 Février 2019 16:14:54 
>>>>> Objet: Re:  ceph osd commit latency increase over time, until restart 
>>>>> 
>>>>> I'm just seeing 
>>>>> 
>>>>> StupidAllocator::_aligned_len 
>>>>> and 
>>>>> btree::btree_iterator<btree::btree_node<btree::btree_map_params<unsigned long, unsigned long, std::less<unsigned long>, mempoo 
>>>>> 
>>>>> on 1 osd, both 10%. 
>>>>> 
>>>>> here the dump_mempools 
>>>>> 
>>>>> { 
>>>>> "mempool": { 
>>>>> "by_pool": { 
>>>>> "bloom_filter": { 
>>>>> "items": 0, 
>>>>> "bytes": 0 
>>>>> }, 
>>>>> "bluestore_alloc": { 
>>>>> "items": 210243456, 
>>>>> "bytes": 210243456 
>>>>> }, 
>>>>> "bluestore_cache_data": { 
>>>>> "items": 54, 
>>>>> "bytes": 643072 
>>>>> }, 
>>>>> "bluestore_cache_onode": { 
>>>>> "items": 105637, 
>>>>> "bytes": 70988064 
>>>>> }, 
>>>>> "bluestore_cache_other": { 
>>>>> "items": 48661920, 
>>>>> "bytes": 1539544228 
>>>>> }, 
>>>>> "bluestore_fsck": { 
>>>>> "items": 0, 
>>>>> "bytes": 0 
>>>>> }, 
>>>>> "bluestore_txc": { 
>>>>> "items": 12, 
>>>>> "bytes": 8928 
>>>>> }, 
>>>>> "bluestore_writing_deferred": { 
>>>>> "items": 406, 
>>>>> "bytes": 4792868 
>>>>> }, 
>>>>> "bluestore_writing": { 
>>>>> "items": 66, 
>>>>> "bytes": 1085440 
>>>>> }, 
>>>>> "bluefs": { 
>>>>> "items": 1882, 
>>>>> "bytes": 93600 
>>>>> }, 
>>>>> "buffer_anon": { 
>>>>> "items": 138986, 
>>>>> "bytes": 24983701 
>>>>> }, 
>>>>> "buffer_meta": { 
>>>>> "items": 544, 
>>>>> "bytes": 34816 
>>>>> }, 
>>>>> "osd": { 
>>>>> "items": 243, 
>>>>> "bytes": 3089016 
>>>>> }, 
>>>>> "osd_mapbl": { 
>>>>> "items": 36, 
>>>>> "bytes": 179308 
>>>>> }, 
>>>>> "osd_pglog": { 
>>>>> "items": 952564, 
>>>>> "bytes": 372459684 
>>>>> }, 
>>>>> "osdmap": { 
>>>>> "items": 3639, 
>>>>> "bytes": 224664 
>>>>> }, 
>>>>> "osdmap_mapping": { 
>>>>> "items": 0, 
>>>>> "bytes": 0 
>>>>> }, 
>>>>> "pgmap": { 
>>>>> "items": 0, 
>>>>> "bytes": 0 
>>>>> }, 
>>>>> "mds_co": { 
>>>>> "items": 0, 
>>>>> "bytes": 0 
>>>>> }, 
>>>>> "unittest_1": { 
>>>>> "items": 0, 
>>>>> "bytes": 0 
>>>>> }, 
>>>>> "unittest_2": { 
>>>>> "items": 0, 
>>>>> "bytes": 0 
>>>>> } 
>>>>> }, 
>>>>> "total": { 
>>>>> "items": 260109445, 
>>>>> "bytes": 2228370845 
>>>>> } 
>>>>> } 
>>>>> } 
>>>>> 
>>>>> 
>>>>> and the perf dump 
>>>>> 
>>>>> root@ceph5-2:~# ceph daemon osd.4 perf dump 
>>>>> { 
>>>>> "AsyncMessenger::Worker-0": { 
>>>>> "msgr_recv_messages": 22948570, 
>>>>> "msgr_send_messages": 22561570, 
>>>>> "msgr_recv_bytes": 333085080271, 
>>>>> "msgr_send_bytes": 261798871204, 
>>>>> "msgr_created_connections": 6152, 
>>>>> "msgr_active_connections": 2701, 
>>>>> "msgr_running_total_time": 1055.197867330, 
>>>>> "msgr_running_send_time": 352.764480121, 
>>>>> "msgr_running_recv_time": 499.206831955, 
>>>>> "msgr_running_fast_dispatch_time": 130.982201607 
>>>>> }, 
>>>>> "AsyncMessenger::Worker-1": { 
>>>>> "msgr_recv_messages": 18801593, 
>>>>> "msgr_send_messages": 18430264, 
>>>>> "msgr_recv_bytes": 306871760934, 
>>>>> "msgr_send_bytes": 192789048666, 
>>>>> "msgr_created_connections": 5773, 
>>>>> "msgr_active_connections": 2721, 
>>>>> "msgr_running_total_time": 816.821076305, 
>>>>> "msgr_running_send_time": 261.353228926, 
>>>>> "msgr_running_recv_time": 394.035587911, 
>>>>> "msgr_running_fast_dispatch_time": 104.012155720 
>>>>> }, 
>>>>> "AsyncMessenger::Worker-2": { 
>>>>> "msgr_recv_messages": 18463400, 
>>>>> "msgr_send_messages": 18105856, 
>>>>> "msgr_recv_bytes": 187425453590, 
>>>>> "msgr_send_bytes": 220735102555, 
>>>>> "msgr_created_connections": 5897, 
>>>>> "msgr_active_connections": 2605, 
>>>>> "msgr_running_total_time": 807.186854324, 
>>>>> "msgr_running_send_time": 296.834435839, 
>>>>> "msgr_running_recv_time": 351.364389691, 
>>>>> "msgr_running_fast_dispatch_time": 101.215776792 
>>>>> }, 
>>>>> "bluefs": { 
>>>>> "gift_bytes": 0, 
>>>>> "reclaim_bytes": 0, 
>>>>> "db_total_bytes": 256050724864, 
>>>>> "db_used_bytes": 12413042688, 
>>>>> "wal_total_bytes": 0, 
>>>>> "wal_used_bytes": 0, 
>>>>> "slow_total_bytes": 0, 
>>>>> "slow_used_bytes": 0, 
>>>>> "num_files": 209, 
>>>>> "log_bytes": 10383360, 
>>>>> "log_compactions": 14, 
>>>>> "logged_bytes": 336498688, 
>>>>> "files_written_wal": 2, 
>>>>> "files_written_sst": 4499, 
>>>>> "bytes_written_wal": 417989099783, 
>>>>> "bytes_written_sst": 213188750209 
>>>>> }, 
>>>>> "bluestore": { 
>>>>> "kv_flush_lat": { 
>>>>> "avgcount": 26371957, 
>>>>> "sum": 26.734038497, 
>>>>> "avgtime": 0.000001013 
>>>>> }, 
>>>>> "kv_commit_lat": { 
>>>>> "avgcount": 26371957, 
>>>>> "sum": 3397.491150603, 
>>>>> "avgtime": 0.000128829 
>>>>> }, 
>>>>> "kv_lat": { 
>>>>> "avgcount": 26371957, 
>>>>> "sum": 3424.225189100, 
>>>>> "avgtime": 0.000129843 
>>>>> }, 
>>>>> "state_prepare_lat": { 
>>>>> "avgcount": 30484924, 
>>>>> "sum": 3689.542105337, 
>>>>> "avgtime": 0.000121028 
>>>>> }, 
>>>>> "state_aio_wait_lat": { 
>>>>> "avgcount": 30484924, 
>>>>> "sum": 509.864546111, 
>>>>> "avgtime": 0.000016725 
>>>>> }, 
>>>>> "state_io_done_lat": { 
>>>>> "avgcount": 30484924, 
>>>>> "sum": 24.534052953, 
>>>>> "avgtime": 0.000000804 
>>>>> }, 
>>>>> "state_kv_queued_lat": { 
>>>>> "avgcount": 30484924, 
>>>>> "sum": 3488.338424238, 
>>>>> "avgtime": 0.000114428 
>>>>> }, 
>>>>> "state_kv_commiting_lat": { 
>>>>> "avgcount": 30484924, 
>>>>> "sum": 5660.437003432, 
>>>>> "avgtime": 0.000185679 
>>>>> }, 
>>>>> "state_kv_done_lat": { 
>>>>> "avgcount": 30484924, 
>>>>> "sum": 7.763511500, 
>>>>> "avgtime": 0.000000254 
>>>>> }, 
>>>>> "state_deferred_queued_lat": { 
>>>>> "avgcount": 26346134, 
>>>>> "sum": 666071.296856696, 
>>>>> "avgtime": 0.025281557 
>>>>> }, 
>>>>> "state_deferred_aio_wait_lat": { 
>>>>> "avgcount": 26346134, 
>>>>> "sum": 1755.660547071, 
>>>>> "avgtime": 0.000066638 
>>>>> }, 
>>>>> "state_deferred_cleanup_lat": { 
>>>>> "avgcount": 26346134, 
>>>>> "sum": 185465.151653703, 
>>>>> "avgtime": 0.007039558 
>>>>> }, 
>>>>> "state_finishing_lat": { 
>>>>> "avgcount": 30484920, 
>>>>> "sum": 3.046847481, 
>>>>> "avgtime": 0.000000099 
>>>>> }, 
>>>>> "state_done_lat": { 
>>>>> "avgcount": 30484920, 
>>>>> "sum": 13193.362685280, 
>>>>> "avgtime": 0.000432783 
>>>>> }, 
>>>>> "throttle_lat": { 
>>>>> "avgcount": 30484924, 
>>>>> "sum": 14.634269979, 
>>>>> "avgtime": 0.000000480 
>>>>> }, 
>>>>> "submit_lat": { 
>>>>> "avgcount": 30484924, 
>>>>> "sum": 3873.883076148, 
>>>>> "avgtime": 0.000127075 
>>>>> }, 
>>>>> "commit_lat": { 
>>>>> "avgcount": 30484924, 
>>>>> "sum": 13376.492317331, 
>>>>> "avgtime": 0.000438790 
>>>>> }, 
>>>>> "read_lat": { 
>>>>> "avgcount": 5873923, 
>>>>> "sum": 1817.167582057, 
>>>>> "avgtime": 0.000309361 
>>>>> }, 
>>>>> "read_onode_meta_lat": { 
>>>>> "avgcount": 19608201, 
>>>>> "sum": 146.770464482, 
>>>>> "avgtime": 0.000007485 
>>>>> }, 
>>>>> "read_wait_aio_lat": { 
>>>>> "avgcount": 13734278, 
>>>>> "sum": 2532.578077242, 
>>>>> "avgtime": 0.000184398 
>>>>> }, 
>>>>> "compress_lat": { 
>>>>> "avgcount": 0, 
>>>>> "sum": 0.000000000, 
>>>>> "avgtime": 0.000000000 
>>>>> }, 
>>>>> "decompress_lat": { 
>>>>> "avgcount": 1346945, 
>>>>> "sum": 26.227575896, 
>>>>> "avgtime": 0.000019471 
>>>>> }, 
>>>>> "csum_lat": { 
>>>>> "avgcount": 28020392, 
>>>>> "sum": 149.587819041, 
>>>>> "avgtime": 0.000005338 
>>>>> }, 
>>>>> "compress_success_count": 0, 
>>>>> "compress_rejected_count": 0, 
>>>>> "write_pad_bytes": 352923605, 
>>>>> "deferred_write_ops": 24373340, 
>>>>> "deferred_write_bytes": 216791842816, 
>>>>> "write_penalty_read_ops": 8062366, 
>>>>> "bluestore_allocated": 3765566013440, 
>>>>> "bluestore_stored": 4186255221852, 
>>>>> "bluestore_compressed": 39981379040, 
>>>>> "bluestore_compressed_allocated": 73748348928, 
>>>>> "bluestore_compressed_original": 165041381376, 
>>>>> "bluestore_onodes": 104232, 
>>>>> "bluestore_onode_hits": 71206874, 
>>>>> "bluestore_onode_misses": 1217914, 
>>>>> "bluestore_onode_shard_hits": 260183292, 
>>>>> "bluestore_onode_shard_misses": 22851573, 
>>>>> "bluestore_extents": 3394513, 
>>>>> "bluestore_blobs": 2773587, 
>>>>> "bluestore_buffers": 0, 
>>>>> "bluestore_buffer_bytes": 0, 
>>>>> "bluestore_buffer_hit_bytes": 62026011221, 
>>>>> "bluestore_buffer_miss_bytes": 995233669922, 
>>>>> "bluestore_write_big": 5648815, 
>>>>> "bluestore_write_big_bytes": 552502214656, 
>>>>> "bluestore_write_big_blobs": 12440992, 
>>>>> "bluestore_write_small": 35883770, 
>>>>> "bluestore_write_small_bytes": 223436965719, 
>>>>> "bluestore_write_small_unused": 408125, 
>>>>> "bluestore_write_small_deferred": 34961455, 
>>>>> "bluestore_write_small_pre_read": 34961455, 
>>>>> "bluestore_write_small_new": 514190, 
>>>>> "bluestore_txc": 30484924, 
>>>>> "bluestore_onode_reshard": 5144189, 
>>>>> "bluestore_blob_split": 60104, 
>>>>> "bluestore_extent_compress": 53347252, 
>>>>> "bluestore_gc_merged": 21142528, 
>>>>> "bluestore_read_eio": 0, 
>>>>> "bluestore_fragmentation_micros": 67 
>>>>> }, 
>>>>> "finisher-defered_finisher": { 
>>>>> "queue_len": 0, 
>>>>> "complete_latency": { 
>>>>> "avgcount": 0, 
>>>>> "sum": 0.000000000, 
>>>>> "avgtime": 0.000000000 
>>>>> } 
>>>>> }, 
>>>>> "finisher-finisher-0": { 
>>>>> "queue_len": 0, 
>>>>> "complete_latency": { 
>>>>> "avgcount": 26625163, 
>>>>> "sum": 1057.506990951, 
>>>>> "avgtime": 0.000039718 
>>>>> } 
>>>>> }, 
>>>>> "finisher-objecter-finisher-0": { 
>>>>> "queue_len": 0, 
>>>>> "complete_latency": { 
>>>>> "avgcount": 0, 
>>>>> "sum": 0.000000000, 
>>>>> "avgtime": 0.000000000 
>>>>> } 
>>>>> }, 
>>>>> "mutex-OSDShard.0::sdata_wait_lock": { 
>>>>> "wait": { 
>>>>> "avgcount": 0, 
>>>>> "sum": 0.000000000, 
>>>>> "avgtime": 0.000000000 
>>>>> } 
>>>>> }, 
>>>>> "mutex-OSDShard.0::shard_lock": { 
>>>>> "wait": { 
>>>>> "avgcount": 0, 
>>>>> "sum": 0.000000000, 
>>>>> "avgtime": 0.000000000 
>>>>> } 
>>>>> }, 
>>>>> "mutex-OSDShard.1::sdata_wait_lock": { 
>>>>> "wait": { 
>>>>> "avgcount": 0, 
>>>>> "sum": 0.000000000, 
>>>>> "avgtime": 0.000000000 
>>>>> } 
>>>>> }, 
>>>>> "mutex-OSDShard.1::shard_lock": { 
>>>>> "wait": { 
>>>>> "avgcount": 0, 
>>>>> "sum": 0.000000000, 
>>>>> "avgtime": 0.000000000 
>>>>> } 
>>>>> }, 
>>>>> "mutex-OSDShard.2::sdata_wait_lock": { 
>>>>> "wait": { 
>>>>> "avgcount": 0, 
>>>>> "sum": 0.000000000, 
>>>>> "avgtime": 0.000000000 
>>>>> } 
>>>>> }, 
>>>>> "mutex-OSDShard.2::shard_lock": { 
>>>>> "wait": { 
>>>>> "avgcount": 0, 
>>>>> "sum": 0.000000000, 
>>>>> "avgtime": 0.000000000 
>>>>> } 
>>>>> }, 
>>>>> "mutex-OSDShard.3::sdata_wait_lock": { 
>>>>> "wait": { 
>>>>> "avgcount": 0, 
>>>>> "sum": 0.000000000, 
>>>>> "avgtime": 0.000000000 
>>>>> } 
>>>>> }, 
>>>>> "mutex-OSDShard.3::shard_lock": { 
>>>>> "wait": { 
>>>>> "avgcount": 0, 
>>>>> "sum": 0.000000000, 
>>>>> "avgtime": 0.000000000 
>>>>> } 
>>>>> }, 
>>>>> "mutex-OSDShard.4::sdata_wait_lock": { 
>>>>> "wait": { 
>>>>> "avgcount": 0, 
>>>>> "sum": 0.000000000, 
>>>>> "avgtime": 0.000000000 
>>>>> } 
>>>>> }, 
>>>>> "mutex-OSDShard.4::shard_lock": { 
>>>>> "wait": { 
>>>>> "avgcount": 0, 
>>>>> "sum": 0.000000000, 
>>>>> "avgtime": 0.000000000 
>>>>> } 
>>>>> }, 
>>>>> "mutex-OSDShard.5::sdata_wait_lock": { 
>>>>> "wait": { 
>>>>> "avgcount": 0, 
>>>>> "sum": 0.000000000, 
>>>>> "avgtime": 0.000000000 
>>>>> } 
>>>>> }, 
>>>>> "mutex-OSDShard.5::shard_lock": { 
>>>>> "wait": { 
>>>>> "avgcount": 0, 
>>>>> "sum": 0.000000000, 
>>>>> "avgtime": 0.000000000 
>>>>> } 
>>>>> }, 
>>>>> "mutex-OSDShard.6::sdata_wait_lock": { 
>>>>> "wait": { 
>>>>> "avgcount": 0, 
>>>>> "sum": 0.000000000, 
>>>>> "avgtime": 0.000000000 
>>>>> } 
>>>>> }, 
>>>>> "mutex-OSDShard.6::shard_lock": { 
>>>>> "wait": { 
>>>>> "avgcount": 0, 
>>>>> "sum": 0.000000000, 
>>>>> "avgtime": 0.000000000 
>>>>> } 
>>>>> }, 
>>>>> "mutex-OSDShard.7::sdata_wait_lock": { 
>>>>> "wait": { 
>>>>> "avgcount": 0, 
>>>>> "sum": 0.000000000, 
>>>>> "avgtime": 0.000000000 
>>>>> } 
>>>>> }, 
>>>>> "mutex-OSDShard.7::shard_lock": { 
>>>>> "wait": { 
>>>>> "avgcount": 0, 
>>>>> "sum": 0.000000000, 
>>>>> "avgtime": 0.000000000 
>>>>> } 
>>>>> }, 
>>>>> "objecter": { 
>>>>> "op_active": 0, 
>>>>> "op_laggy": 0, 
>>>>> "op_send": 0, 
>>>>> "op_send_bytes": 0, 
>>>>> "op_resend": 0, 
>>>>> "op_reply": 0, 
>>>>> "op": 0, 
>>>>> "op_r": 0, 
>>>>> "op_w": 0, 
>>>>> "op_rmw": 0, 
>>>>> "op_pg": 0, 
>>>>> "osdop_stat": 0, 
>>>>> "osdop_create": 0, 
>>>>> "osdop_read": 0, 
>>>>> "osdop_write": 0, 
>>>>> "osdop_writefull": 0, 
>>>>> "osdop_writesame": 0, 
>>>>> "osdop_append": 0, 
>>>>> "osdop_zero": 0, 
>>>>> "osdop_truncate": 0, 
>>>>> "osdop_delete": 0, 
>>>>> "osdop_mapext": 0, 
>>>>> "osdop_sparse_read": 0, 
>>>>> "osdop_clonerange": 0, 
>>>>> "osdop_getxattr": 0, 
>>>>> "osdop_setxattr": 0, 
>>>>> "osdop_cmpxattr": 0, 
>>>>> "osdop_rmxattr": 0, 
>>>>> "osdop_resetxattrs": 0, 
>>>>> "osdop_tmap_up": 0, 
>>>>> "osdop_tmap_put": 0, 
>>>>> "osdop_tmap_get": 0, 
>>>>> "osdop_call": 0, 
>>>>> "osdop_watch": 0, 
>>>>> "osdop_notify": 0, 
>>>>> "osdop_src_cmpxattr": 0, 
>>>>> "osdop_pgls": 0, 
>>>>> "osdop_pgls_filter": 0, 
>>>>> "osdop_other": 0, 
>>>>> "linger_active": 0, 
>>>>> "linger_send": 0, 
>>>>> "linger_resend": 0, 
>>>>> "linger_ping": 0, 
>>>>> "poolop_active": 0, 
>>>>> "poolop_send": 0, 
>>>>> "poolop_resend": 0, 
>>>>> "poolstat_active": 0, 
>>>>> "poolstat_send": 0, 
>>>>> "poolstat_resend": 0, 
>>>>> "statfs_active": 0, 
>>>>> "statfs_send": 0, 
>>>>> "statfs_resend": 0, 
>>>>> "command_active": 0, 
>>>>> "command_send": 0, 
>>>>> "command_resend": 0, 
>>>>> "map_epoch": 105913, 
>>>>> "map_full": 0, 
>>>>> "map_inc": 828, 
>>>>> "osd_sessions": 0, 
>>>>> "osd_session_open": 0, 
>>>>> "osd_session_close": 0, 
>>>>> "osd_laggy": 0, 
>>>>> "omap_wr": 0, 
>>>>> "omap_rd": 0, 
>>>>> "omap_del": 0 
>>>>> }, 
>>>>> "osd": { 
>>>>> "op_wip": 0, 
>>>>> "op": 16758102, 
>>>>> "op_in_bytes": 238398820586, 
>>>>> "op_out_bytes": 165484999463, 
>>>>> "op_latency": { 
>>>>> "avgcount": 16758102, 
>>>>> "sum": 38242.481640842, 
>>>>> "avgtime": 0.002282029 
>>>>> }, 
>>>>> "op_process_latency": { 
>>>>> "avgcount": 16758102, 
>>>>> "sum": 28644.906310687, 
>>>>> "avgtime": 0.001709316 
>>>>> }, 
>>>>> "op_prepare_latency": { 
>>>>> "avgcount": 16761367, 
>>>>> "sum": 3489.856599934, 
>>>>> "avgtime": 0.000208208 
>>>>> }, 
>>>>> "op_r": 6188565, 
>>>>> "op_r_out_bytes": 165484999463, 
>>>>> "op_r_latency": { 
>>>>> "avgcount": 6188565, 
>>>>> "sum": 4507.365756792, 
>>>>> "avgtime": 0.000728337 
>>>>> }, 
>>>>> "op_r_process_latency": { 
>>>>> "avgcount": 6188565, 
>>>>> "sum": 942.363063429, 
>>>>> "avgtime": 0.000152274 
>>>>> }, 
>>>>> "op_r_prepare_latency": { 
>>>>> "avgcount": 6188644, 
>>>>> "sum": 982.866710389, 
>>>>> "avgtime": 0.000158817 
>>>>> }, 
>>>>> "op_w": 10546037, 
>>>>> "op_w_in_bytes": 238334329494, 
>>>>> "op_w_latency": { 
>>>>> "avgcount": 10546037, 
>>>>> "sum": 33160.719998316, 
>>>>> "avgtime": 0.003144377 
>>>>> }, 
>>>>> "op_w_process_latency": { 
>>>>> "avgcount": 10546037, 
>>>>> "sum": 27668.702029030, 
>>>>> "avgtime": 0.002623611 
>>>>> }, 
>>>>> "op_w_prepare_latency": { 
>>>>> "avgcount": 10548652, 
>>>>> "sum": 2499.688609173, 
>>>>> "avgtime": 0.000236967 
>>>>> }, 
>>>>> "op_rw": 23500, 
>>>>> "op_rw_in_bytes": 64491092, 
>>>>> "op_rw_out_bytes": 0, 
>>>>> "op_rw_latency": { 
>>>>> "avgcount": 23500, 
>>>>> "sum": 574.395885734, 
>>>>> "avgtime": 0.024442378 
>>>>> }, 
>>>>> "op_rw_process_latency": { 
>>>>> "avgcount": 23500, 
>>>>> "sum": 33.841218228, 
>>>>> "avgtime": 0.001440051 
>>>>> }, 
>>>>> "op_rw_prepare_latency": { 
>>>>> "avgcount": 24071, 
>>>>> "sum": 7.301280372, 
>>>>> "avgtime": 0.000303322 
>>>>> }, 
>>>>> "op_before_queue_op_lat": { 
>>>>> "avgcount": 57892986, 
>>>>> "sum": 1502.117718889, 
>>>>> "avgtime": 0.000025946 
>>>>> }, 
>>>>> "op_before_dequeue_op_lat": { 
>>>>> "avgcount": 58091683, 
>>>>> "sum": 45194.453254037, 
>>>>> "avgtime": 0.000777984 
>>>>> }, 
>>>>> "subop": 19784758, 
>>>>> "subop_in_bytes": 547174969754, 
>>>>> "subop_latency": { 
>>>>> "avgcount": 19784758, 
>>>>> "sum": 13019.714424060, 
>>>>> "avgtime": 0.000658067 
>>>>> }, 
>>>>> "subop_w": 19784758, 
>>>>> "subop_w_in_bytes": 547174969754, 
>>>>> "subop_w_latency": { 
>>>>> "avgcount": 19784758, 
>>>>> "sum": 13019.714424060, 
>>>>> "avgtime": 0.000658067 
>>>>> }, 
>>>>> "subop_pull": 0, 
>>>>> "subop_pull_latency": { 
>>>>> "avgcount": 0, 
>>>>> "sum": 0.000000000, 
>>>>> "avgtime": 0.000000000 
>>>>> }, 
>>>>> "subop_push": 0, 
>>>>> "subop_push_in_bytes": 0, 
>>>>> "subop_push_latency": { 
>>>>> "avgcount": 0, 
>>>>> "sum": 0.000000000, 
>>>>> "avgtime": 0.000000000 
>>>>> }, 
>>>>> "pull": 0, 
>>>>> "push": 2003, 
>>>>> "push_out_bytes": 5560009728, 
>>>>> "recovery_ops": 1940, 
>>>>> "loadavg": 118, 
>>>>> "buffer_bytes": 0, 
>>>>> "history_alloc_Mbytes": 0, 
>>>>> "history_alloc_num": 0, 
>>>>> "cached_crc": 0, 
>>>>> "cached_crc_adjusted": 0, 
>>>>> "missed_crc": 0, 
>>>>> "numpg": 243, 
>>>>> "numpg_primary": 82, 
>>>>> "numpg_replica": 161, 
>>>>> "numpg_stray": 0, 
>>>>> "numpg_removing": 0, 
>>>>> "heartbeat_to_peers": 10, 
>>>>> "map_messages": 7013, 
>>>>> "map_message_epochs": 7143, 
>>>>> "map_message_epoch_dups": 6315, 
>>>>> "messages_delayed_for_map": 0, 
>>>>> "osd_map_cache_hit": 203309, 
>>>>> "osd_map_cache_miss": 33, 
>>>>> "osd_map_cache_miss_low": 0, 
>>>>> "osd_map_cache_miss_low_avg": { 
>>>>> "avgcount": 0, 
>>>>> "sum": 0 
>>>>> }, 
>>>>> "osd_map_bl_cache_hit": 47012, 
>>>>> "osd_map_bl_cache_miss": 1681, 
>>>>> "stat_bytes": 6401248198656, 
>>>>> "stat_bytes_used": 3777979072512, 
>>>>> "stat_bytes_avail": 2623269126144, 
>>>>> "copyfrom": 0, 
>>>>> "tier_promote": 0, 
>>>>> "tier_flush": 0, 
>>>>> "tier_flush_fail": 0, 
>>>>> "tier_try_flush": 0, 
>>>>> "tier_try_flush_fail": 0, 
>>>>> "tier_evict": 0, 
>>>>> "tier_whiteout": 1631, 
>>>>> "tier_dirty": 22360, 
>>>>> "tier_clean": 0, 
>>>>> "tier_delay": 0, 
>>>>> "tier_proxy_read": 0, 
>>>>> "tier_proxy_write": 0, 
>>>>> "agent_wake": 0, 
>>>>> "agent_skip": 0, 
>>>>> "agent_flush": 0, 
>>>>> "agent_evict": 0, 
>>>>> "object_ctx_cache_hit": 16311156, 
>>>>> "object_ctx_cache_total": 17426393, 
>>>>> "op_cache_hit": 0, 
>>>>> "osd_tier_flush_lat": { 
>>>>> "avgcount": 0, 
>>>>> "sum": 0.000000000, 
>>>>> "avgtime": 0.000000000 
>>>>> }, 
>>>>> "osd_tier_promote_lat": { 
>>>>> "avgcount": 0, 
>>>>> "sum": 0.000000000, 
>>>>> "avgtime": 0.000000000 
>>>>> }, 
>>>>> "osd_tier_r_lat": { 
>>>>> "avgcount": 0, 
>>>>> "sum": 0.000000000, 
>>>>> "avgtime": 0.000000000 
>>>>> }, 
>>>>> "osd_pg_info": 30483113, 
>>>>> "osd_pg_fastinfo": 29619885, 
>>>>> "osd_pg_biginfo": 81703 
>>>>> }, 
>>>>> "recoverystate_perf": { 
>>>>> "initial_latency": { 
>>>>> "avgcount": 243, 
>>>>> "sum": 6.869296500, 
>>>>> "avgtime": 0.028268709 
>>>>> }, 
>>>>> "started_latency": { 
>>>>> "avgcount": 1125, 
>>>>> "sum": 13551384.917335850, 
>>>>> "avgtime": 12045.675482076 
>>>>> }, 
>>>>> "reset_latency": { 
>>>>> "avgcount": 1368, 
>>>>> "sum": 1101.727799040, 
>>>>> "avgtime": 0.805356578 
>>>>> }, 
>>>>> "start_latency": { 
>>>>> "avgcount": 1368, 
>>>>> "sum": 0.002014799, 
>>>>> "avgtime": 0.000001472 
>>>>> }, 
>>>>> "primary_latency": { 
>>>>> "avgcount": 507, 
>>>>> "sum": 4575560.638823428, 
>>>>> "avgtime": 9024.774435549 
>>>>> }, 
>>>>> "peering_latency": { 
>>>>> "avgcount": 550, 
>>>>> "sum": 499.372283616, 
>>>>> "avgtime": 0.907949606 
>>>>> }, 
>>>>> "backfilling_latency": { 
>>>>> "avgcount": 0, 
>>>>> "sum": 0.000000000, 
>>>>> "avgtime": 0.000000000 
>>>>> }, 
>>>>> "waitremotebackfillreserved_latency": { 
>>>>> "avgcount": 0, 
>>>>> "sum": 0.000000000, 
>>>>> "avgtime": 0.000000000 
>>>>> }, 
>>>>> "waitlocalbackfillreserved_latency": { 
>>>>> "avgcount": 0, 
>>>>> "sum": 0.000000000, 
>>>>> "avgtime": 0.000000000 
>>>>> }, 
>>>>> "notbackfilling_latency": { 
>>>>> "avgcount": 0, 
>>>>> "sum": 0.000000000, 
>>>>> "avgtime": 0.000000000 
>>>>> }, 
>>>>> "repnotrecovering_latency": { 
>>>>> "avgcount": 1009, 
>>>>> "sum": 8975301.082274411, 
>>>>> "avgtime": 8895.243887288 
>>>>> }, 
>>>>> "repwaitrecoveryreserved_latency": { 
>>>>> "avgcount": 420, 
>>>>> "sum": 99.846056520, 
>>>>> "avgtime": 0.237728706 
>>>>> }, 
>>>>> "repwaitbackfillreserved_latency": { 
>>>>> "avgcount": 0, 
>>>>> "sum": 0.000000000, 
>>>>> "avgtime": 0.000000000 
>>>>> }, 
>>>>> "reprecovering_latency": { 
>>>>> "avgcount": 420, 
>>>>> "sum": 241.682764382, 
>>>>> "avgtime": 0.575435153 
>>>>> }, 
>>>>> "activating_latency": { 
>>>>> "avgcount": 507, 
>>>>> "sum": 16.893347339, 
>>>>> "avgtime": 0.033320211 
>>>>> }, 
>>>>> "waitlocalrecoveryreserved_latency": { 
>>>>> "avgcount": 199, 
>>>>> "sum": 672.335512769, 
>>>>> "avgtime": 3.378570415 
>>>>> }, 
>>>>> "waitremoterecoveryreserved_latency": { 
>>>>> "avgcount": 199, 
>>>>> "sum": 213.536439363, 
>>>>> "avgtime": 1.073047433 
>>>>> }, 
>>>>> "recovering_latency": { 
>>>>> "avgcount": 199, 
>>>>> "sum": 79.007696479, 
>>>>> "avgtime": 0.397023600 
>>>>> }, 
>>>>> "recovered_latency": { 
>>>>> "avgcount": 507, 
>>>>> "sum": 14.000732748, 
>>>>> "avgtime": 0.027614857 
>>>>> }, 
>>>>> "clean_latency": { 
>>>>> "avgcount": 395, 
>>>>> "sum": 4574325.900371083, 
>>>>> "avgtime": 11580.571899673 
>>>>> }, 
>>>>> "active_latency": { 
>>>>> "avgcount": 425, 
>>>>> "sum": 4575107.630123680, 
>>>>> "avgtime": 10764.959129702 
>>>>> }, 
>>>>> "replicaactive_latency": { 
>>>>> "avgcount": 589, 
>>>>> "sum": 8975184.499049954, 
>>>>> "avgtime": 15238.004242869 
>>>>> }, 
>>>>> "stray_latency": { 
>>>>> "avgcount": 818, 
>>>>> "sum": 800.729455666, 
>>>>> "avgtime": 0.978886865 
>>>>> }, 
>>>>> "getinfo_latency": { 
>>>>> "avgcount": 550, 
>>>>> "sum": 15.085667048, 
>>>>> "avgtime": 0.027428485 
>>>>> }, 
>>>>> "getlog_latency": { 
>>>>> "avgcount": 546, 
>>>>> "sum": 3.482175693, 
>>>>> "avgtime": 0.006377611 
>>>>> }, 
>>>>> "waitactingchange_latency": { 
>>>>> "avgcount": 39, 
>>>>> "sum": 35.444551284, 
>>>>> "avgtime": 0.908834648 
>>>>> }, 
>>>>> "incomplete_latency": { 
>>>>> "avgcount": 0, 
>>>>> "sum": 0.000000000, 
>>>>> "avgtime": 0.000000000 
>>>>> }, 
>>>>> "down_latency": { 
>>>>> "avgcount": 0, 
>>>>> "sum": 0.000000000, 
>>>>> "avgtime": 0.000000000 
>>>>> }, 
>>>>> "getmissing_latency": { 
>>>>> "avgcount": 507, 
>>>>> "sum": 6.702129624, 
>>>>> "avgtime": 0.013219190 
>>>>> }, 
>>>>> "waitupthru_latency": { 
>>>>> "avgcount": 507, 
>>>>> "sum": 474.098261727, 
>>>>> "avgtime": 0.935105052 
>>>>> }, 
>>>>> "notrecovering_latency": { 
>>>>> "avgcount": 0, 
>>>>> "sum": 0.000000000, 
>>>>> "avgtime": 0.000000000 
>>>>> } 
>>>>> }, 
>>>>> "rocksdb": { 
>>>>> "get": 28320977, 
>>>>> "submit_transaction": 30484924, 
>>>>> "submit_transaction_sync": 26371957, 
>>>>> "get_latency": { 
>>>>> "avgcount": 28320977, 
>>>>> "sum": 325.900908733, 
>>>>> "avgtime": 0.000011507 
>>>>> }, 
>>>>> "submit_latency": { 
>>>>> "avgcount": 30484924, 
>>>>> "sum": 1835.888692371, 
>>>>> "avgtime": 0.000060222 
>>>>> }, 
>>>>> "submit_sync_latency": { 
>>>>> "avgcount": 26371957, 
>>>>> "sum": 1431.555230628, 
>>>>> "avgtime": 0.000054283 
>>>>> }, 
>>>>> "compact": 0, 
>>>>> "compact_range": 0, 
>>>>> "compact_queue_merge": 0, 
>>>>> "compact_queue_len": 0, 
>>>>> "rocksdb_write_wal_time": { 
>>>>> "avgcount": 0, 
>>>>> "sum": 0.000000000, 
>>>>> "avgtime": 0.000000000 
>>>>> }, 
>>>>> "rocksdb_write_memtable_time": { 
>>>>> "avgcount": 0, 
>>>>> "sum": 0.000000000, 
>>>>> "avgtime": 0.000000000 
>>>>> }, 
>>>>> "rocksdb_write_delay_time": { 
>>>>> "avgcount": 0, 
>>>>> "sum": 0.000000000, 
>>>>> "avgtime": 0.000000000 
>>>>> }, 
>>>>> "rocksdb_write_pre_and_post_time": { 
>>>>> "avgcount": 0, 
>>>>> "sum": 0.000000000, 
>>>>> "avgtime": 0.000000000 
>>>>> } 
>>>>> } 
>>>>> } 
>>>>> 
>>>>> ----- Mail original ----- 
>>>>> De: "Igor Fedotov" <ifedotov@xxxxxxx> 
>>>>> À: "aderumier" <aderumier@xxxxxxxxx> 
>>>>> Cc: "Stefan Priebe, Profihost AG" <s.priebe@xxxxxxxxxxxx>, "Mark Nelson" <mnelson@xxxxxxxxxx>, "Sage Weil" <sage@xxxxxxxxxxxx>, "ceph-users" <ceph-users@xxxxxxxxxxxxxx>, "ceph-devel" <ceph-devel@xxxxxxxxxxxxxxx> 
>>>>> Envoyé: Mardi 5 Février 2019 18:56:51 
>>>>> Objet: Re:  ceph osd commit latency increase over time, until restart 
>>>>> 
>>>>> On 2/4/2019 6:40 PM, Alexandre DERUMIER wrote: 
>>>>>>>> but I don't see l_bluestore_fragmentation counter. 
>>>>>>>> (but I have bluestore_fragmentation_micros) 
>>>>>> ok, this is the same 
>>>>>> 
>>>>>> b.add_u64(l_bluestore_fragmentation, "bluestore_fragmentation_micros", 
>>>>>> "How fragmented bluestore free space is (free extents / max possible number of free extents) * 1000"); 
>>>>>> 
>>>>>> 
>>>>>> Here a graph on last month, with bluestore_fragmentation_micros and latency, 
>>>>>> 
>>>>>> http://odisoweb1.odiso.net/latency_vs_fragmentation_micros.png 
>>>>> hmm, so fragmentation grows eventually and drops on OSD restarts, isn't 
>>>>> it? The same for other OSDs? 
>>>>> 
>>>>> This proves some issue with the allocator - generally fragmentation 
>>>>> might grow but it shouldn't reset on restart. Looks like some intervals 
>>>>> aren't properly merged in run-time. 
>>>>> 
>>>>> On the other side I'm not completely sure that latency degradation is 
>>>>> caused by that - fragmentation growth is relatively small - I don't see 
>>>>> how this might impact performance that high. 
>>>>> 
>>>>> Wondering if you have OSD mempool monitoring (dump_mempools command 
>>>>> output on admin socket) reports? Do you have any historic data? 
>>>>> 
>>>>> If not may I have current output and say a couple more samples with 
>>>>> 8-12 hours interval? 
>>>>> 
>>>>> 
>>>>> Wrt to backporting bitmap allocator to mimic - we haven't had such plans 
>>>>> before that but I'll discuss this at BlueStore meeting shortly. 
>>>>> 
>>>>> 
>>>>> Thanks, 
>>>>> 
>>>>> Igor 
>>>>> 
>>>>>> ----- Mail original ----- 
>>>>>> De: "Alexandre Derumier" <aderumier@xxxxxxxxx> 
>>>>>> À: "Igor Fedotov" <ifedotov@xxxxxxx> 
>>>>>> Cc: "Stefan Priebe, Profihost AG" <s.priebe@xxxxxxxxxxxx>, "Mark Nelson" <mnelson@xxxxxxxxxx>, "Sage Weil" <sage@xxxxxxxxxxxx>, "ceph-users" <ceph-users@xxxxxxxxxxxxxx>, "ceph-devel" <ceph-devel@xxxxxxxxxxxxxxx> 
>>>>>> Envoyé: Lundi 4 Février 2019 16:04:38 
>>>>>> Objet: Re:  ceph osd commit latency increase over time, until restart 
>>>>>> 
>>>>>> Thanks Igor, 
>>>>>> 
>>>>>>>> Could you please collect BlueStore performance counters right after OSD 
>>>>>>>> startup and once you get high latency. 
>>>>>>>> 
>>>>>>>> Specifically 'l_bluestore_fragmentation' parameter is of interest. 
>>>>>> I'm already monitoring with 
>>>>>> "ceph daemon osd.x perf dump ", (I have 2months history will all counters) 
>>>>>> 
>>>>>> but I don't see l_bluestore_fragmentation counter. 
>>>>>> 
>>>>>> (but I have bluestore_fragmentation_micros) 
>>>>>> 
>>>>>> 
>>>>>>>> Also if you're able to rebuild the code I can probably make a simple 
>>>>>>>> patch to track latency and some other internal allocator's paramter to 
>>>>>>>> make sure it's degraded and learn more details. 
>>>>>> Sorry, It's a critical production cluster, I can't test on it :( 
>>>>>> But I have a test cluster, maybe I can try to put some load on it, and try to reproduce. 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>>> More vigorous fix would be to backport bitmap allocator from Nautilus 
>>>>>>>> and try the difference... 
>>>>>> Any plan to backport it to mimic ? (But I can wait for Nautilus) 
>>>>>> perf results of new bitmap allocator seem very promising from what I've seen in PR. 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> ----- Mail original ----- 
>>>>>> De: "Igor Fedotov" <ifedotov@xxxxxxx> 
>>>>>> À: "Alexandre Derumier" <aderumier@xxxxxxxxx>, "Stefan Priebe, Profihost AG" <s.priebe@xxxxxxxxxxxx>, "Mark Nelson" <mnelson@xxxxxxxxxx> 
>>>>>> Cc: "Sage Weil" <sage@xxxxxxxxxxxx>, "ceph-users" <ceph-users@xxxxxxxxxxxxxx>, "ceph-devel" <ceph-devel@xxxxxxxxxxxxxxx> 
>>>>>> Envoyé: Lundi 4 Février 2019 15:51:30 
>>>>>> Objet: Re:  ceph osd commit latency increase over time, until restart 
>>>>>> 
>>>>>> Hi Alexandre, 
>>>>>> 
>>>>>> looks like a bug in StupidAllocator. 
>>>>>> 
>>>>>> Could you please collect BlueStore performance counters right after OSD 
>>>>>> startup and once you get high latency. 
>>>>>> 
>>>>>> Specifically 'l_bluestore_fragmentation' parameter is of interest. 
>>>>>> 
>>>>>> Also if you're able to rebuild the code I can probably make a simple 
>>>>>> patch to track latency and some other internal allocator's paramter to 
>>>>>> make sure it's degraded and learn more details. 
>>>>>> 
>>>>>> 
>>>>>> More vigorous fix would be to backport bitmap allocator from Nautilus 
>>>>>> and try the difference... 
>>>>>> 
>>>>>> 
>>>>>> Thanks, 
>>>>>> 
>>>>>> Igor 
>>>>>> 
>>>>>> 
>>>>>> On 2/4/2019 5:17 PM, Alexandre DERUMIER wrote: 
>>>>>>> Hi again, 
>>>>>>> 
>>>>>>> I speak too fast, the problem has occured again, so it's not tcmalloc cache size related. 
>>>>>>> 
>>>>>>> 
>>>>>>> I have notice something using a simple "perf top", 
>>>>>>> 
>>>>>>> each time I have this problem (I have seen exactly 4 times the same behaviour), 
>>>>>>> 
>>>>>>> when latency is bad, perf top give me : 
>>>>>>> 
>>>>>>> StupidAllocator::_aligned_len 
>>>>>>> and 
>>>>>>> btree::btree_iterator<btree::btree_node<btree::btree_map_params<unsigned long, unsigned long, std::less<unsigned long>, mempoo 
>>>>>>> l::pool_allocator<(mempool::pool_index_t)1, std::pair<unsigned long const, unsigned long> >, 256> >, std::pair<unsigned long const, unsigned long>&, std::pair<unsigned long 
>>>>>>> const, unsigned long>*>::increment_slow() 
>>>>>>> 
>>>>>>> (around 10-20% time for both) 
>>>>>>> 
>>>>>>> 
>>>>>>> when latency is good, I don't see them at all. 
>>>>>>> 
>>>>>>> 
>>>>>>> I have used the Mark wallclock profiler, here the results: 
>>>>>>> 
>>>>>>> http://odisoweb1.odiso.net/gdbpmp-ok.txt 
>>>>>>> 
>>>>>>> http://odisoweb1.odiso.net/gdbpmp-bad.txt 
>>>>>>> 
>>>>>>> 
>>>>>>> here an extract of the thread with btree::btree_iterator && StupidAllocator::_aligned_len 
>>>>>>> 
>>>>>>> 
>>>>>>> + 100.00% clone 
>>>>>>> + 100.00% start_thread 
>>>>>>> + 100.00% ShardedThreadPool::WorkThreadSharded::entry() 
>>>>>>> + 100.00% ShardedThreadPool::shardedthreadpool_worker(unsigned int) 
>>>>>>> + 100.00% OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*) 
>>>>>>> + 70.00% PGOpItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&) 
>>>>>>> | + 70.00% OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&) 
>>>>>>> | + 70.00% PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&) 
>>>>>>> | + 68.00% PGBackend::handle_message(boost::intrusive_ptr<OpRequest>) 
>>>>>>> | | + 68.00% ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>) 
>>>>>>> | | + 68.00% ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>) 
>>>>>>> | | + 67.00% non-virtual thunk to PrimaryLogPG::queue_transactions(std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, boost::intrusive_ptr<OpRequest>) 
>>>>>>> | | | + 67.00% BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*) 
>>>>>>> | | | + 66.00% BlueStore::_txc_add_transaction(BlueStore::TransContext*, ObjectStore::Transaction*) 
>>>>>>> | | | | + 66.00% BlueStore::_write(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>&, unsigned long, unsigned long, ceph::buffer::list&, unsigned int) 
>>>>>>> | | | | + 66.00% BlueStore::_do_write(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>, unsigned long, unsigned long, ceph::buffer::list&, unsigned int) 
>>>>>>> | | | | + 65.00% BlueStore::_do_alloc_write(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>, boost::intrusive_ptr<BlueStore::Onode>, BlueStore::WriteContext*) 
>>>>>>> | | | | | + 64.00% StupidAllocator::allocate(unsigned long, unsigned long, unsigned long, long, std::vector<bluestore_pextent_t, mempool::pool_allocator<(mempool::pool_index_t)4, bluestore_pextent_t> >*) 
>>>>>>> | | | | | | + 64.00% StupidAllocator::allocate_int(unsigned long, unsigned long, long, unsigned long*, unsigned int*) 
>>>>>>> | | | | | | + 34.00% btree::btree_iterator<btree::btree_node<btree::btree_map_params<unsigned long, unsigned long, std::less<unsigned long>, mempool::pool_allocator<(mempool::pool_index_t)1, std::pair<unsigned long const, unsigned long> >, 256> >, std::pair<unsigned long const, unsigned long>&, std::pair<unsigned long const, unsigned long>*>::increment_slow() 
>>>>>>> | | | | | | + 26.00% StupidAllocator::_aligned_len(interval_set<unsigned long, btree::btree_map<unsigned long, unsigned long, std::less<unsigned long>, mempool::pool_allocator<(mempool::pool_index_t)1, std::pair<unsigned long const, unsigned long> >, 256> >::iterator, unsigned long) 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> ----- Mail original ----- 
>>>>>>> De: "Alexandre Derumier" <aderumier@xxxxxxxxx> 
>>>>>>> À: "Stefan Priebe, Profihost AG" <s.priebe@xxxxxxxxxxxx> 
>>>>>>> Cc: "Sage Weil" <sage@xxxxxxxxxxxx>, "ceph-users" <ceph-users@xxxxxxxxxxxxxx>, "ceph-devel" <ceph-devel@xxxxxxxxxxxxxxx> 
>>>>>>> Envoyé: Lundi 4 Février 2019 09:38:11 
>>>>>>> Objet: Re:  ceph osd commit latency increase over time, until restart 
>>>>>>> 
>>>>>>> Hi, 
>>>>>>> 
>>>>>>> some news: 
>>>>>>> 
>>>>>>> I have tried with different transparent hugepage values (madvise, never) : no change 
>>>>>>> 
>>>>>>> I have tried to increase bluestore_cache_size_ssd to 8G: no change 
>>>>>>> 
>>>>>>> I have tried to increase TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES to 256mb : it seem to help, after 24h I'm still around 1,5ms. (need to wait some more days to be sure) 
>>>>>>> 
>>>>>>> 
>>>>>>> Note that this behaviour seem to happen really faster (< 2 days) on my big nvme drives (6TB), 
>>>>>>> my others clusters user 1,6TB ssd. 
>>>>>>> 
>>>>>>> Currently I'm using only 1 osd by nvme (I don't have more than 5000iops by osd), but I'll try this week with 2osd by nvme, to see if it's helping. 
>>>>>>> 
>>>>>>> 
>>>>>>> BTW, does somebody have already tested ceph without tcmalloc, with glibc >= 2.26 (which have also thread cache) ? 
>>>>>>> 
>>>>>>> 
>>>>>>> Regards, 
>>>>>>> 
>>>>>>> Alexandre 
>>>>>>> 
>>>>>>> 
>>>>>>> ----- Mail original ----- 
>>>>>>> De: "aderumier" <aderumier@xxxxxxxxx> 
>>>>>>> À: "Stefan Priebe, Profihost AG" <s.priebe@xxxxxxxxxxxx> 
>>>>>>> Cc: "Sage Weil" <sage@xxxxxxxxxxxx>, "ceph-users" <ceph-users@xxxxxxxxxxxxxx>, "ceph-devel" <ceph-devel@xxxxxxxxxxxxxxx> 
>>>>>>> Envoyé: Mercredi 30 Janvier 2019 19:58:15 
>>>>>>> Objet: Re:  ceph osd commit latency increase over time, until restart 
>>>>>>> 
>>>>>>>>> Thanks. Is there any reason you monitor op_w_latency but not 
>>>>>>>>> op_r_latency but instead op_latency? 
>>>>>>>>> 
>>>>>>>>> Also why do you monitor op_w_process_latency? but not op_r_process_latency? 
>>>>>>> I monitor read too. (I have all metrics for osd sockets, and a lot of graphs). 
>>>>>>> 
>>>>>>> I just don't see latency difference on reads. (or they are very very small vs the write latency increase) 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> ----- Mail original ----- 
>>>>>>> De: "Stefan Priebe, Profihost AG" <s.priebe@xxxxxxxxxxxx> 
>>>>>>> À: "aderumier" <aderumier@xxxxxxxxx> 
>>>>>>> Cc: "Sage Weil" <sage@xxxxxxxxxxxx>, "ceph-users" <ceph-users@xxxxxxxxxxxxxx>, "ceph-devel" <ceph-devel@xxxxxxxxxxxxxxx> 
>>>>>>> Envoyé: Mercredi 30 Janvier 2019 19:50:20 
>>>>>>> Objet: Re:  ceph osd commit latency increase over time, until restart 
>>>>>>> 
>>>>>>> Hi, 
>>>>>>> 
>>>>>>> Am 30.01.19 um 14:59 schrieb Alexandre DERUMIER: 
>>>>>>>> Hi Stefan, 
>>>>>>>> 
>>>>>>>>>> currently i'm in the process of switching back from jemalloc to tcmalloc 
>>>>>>>>>> like suggested. This report makes me a little nervous about my change. 
>>>>>>>> Well,I'm really not sure that it's a tcmalloc bug. 
>>>>>>>> maybe bluestore related (don't have filestore anymore to compare) 
>>>>>>>> I need to compare with bigger latencies 
>>>>>>>> 
>>>>>>>> here an example, when all osd at 20-50ms before restart, then after restart (at 21:15), 1ms 
>>>>>>>> http://odisoweb1.odiso.net/latencybad.png 
>>>>>>>> 
>>>>>>>> I observe the latency in my guest vm too, on disks iowait. 
>>>>>>>> 
>>>>>>>> http://odisoweb1.odiso.net/latencybadvm.png 
>>>>>>>> 
>>>>>>>>>> Also i'm currently only monitoring latency for filestore osds. Which 
>>>>>>>>>> exact values out of the daemon do you use for bluestore? 
>>>>>>>> here my influxdb queries: 
>>>>>>>> 
>>>>>>>> It take op_latency.sum/op_latency.avgcount on last second. 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> SELECT non_negative_derivative(first("op_latency.sum"), 1s)/non_negative_derivative(first("op_latency.avgcount"),1s) FROM "ceph" WHERE "host" =~ /^([[host]])$/ AND "id" =~ /^([[osd]])$/ AND $timeFilter GROUP BY time($interval), "host", "id" fill(previous) 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> SELECT non_negative_derivative(first("op_w_latency.sum"), 1s)/non_negative_derivative(first("op_w_latency.avgcount"),1s) FROM "ceph" WHERE "host" =~ /^([[host]])$/ AND collection='osd' AND "id" =~ /^([[osd]])$/ AND $timeFilter GROUP BY time($interval), "host", "id" fill(previous) 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> SELECT non_negative_derivative(first("op_w_process_latency.sum"), 1s)/non_negative_derivative(first("op_w_process_latency.avgcount"),1s) FROM "ceph" WHERE "host" =~ /^([[host]])$/ AND collection='osd' AND "id" =~ /^([[osd]])$/ AND $timeFilter GROUP BY time($interval), "host", "id" fill(previous) 
>>>>>>> Thanks. Is there any reason you monitor op_w_latency but not 
>>>>>>> op_r_latency but instead op_latency? 
>>>>>>> 
>>>>>>> Also why do you monitor op_w_process_latency? but not op_r_process_latency? 
>>>>>>> 
>>>>>>> greets, 
>>>>>>> Stefan 
>>>>>>> 
>>>>>>>> ----- Mail original ----- 
>>>>>>>> De: "Stefan Priebe, Profihost AG" <s.priebe@xxxxxxxxxxxx> 
>>>>>>>> À: "aderumier" <aderumier@xxxxxxxxx>, "Sage Weil" <sage@xxxxxxxxxxxx> 
>>>>>>>> Cc: "ceph-users" <ceph-users@xxxxxxxxxxxxxx>, "ceph-devel" <ceph-devel@xxxxxxxxxxxxxxx> 
>>>>>>>> Envoyé: Mercredi 30 Janvier 2019 08:45:33 
>>>>>>>> Objet: Re:  ceph osd commit latency increase over time, until restart 
>>>>>>>> 
>>>>>>>> Hi, 
>>>>>>>> 
>>>>>>>> Am 30.01.19 um 08:33 schrieb Alexandre DERUMIER: 
>>>>>>>>> Hi, 
>>>>>>>>> 
>>>>>>>>> here some new results, 
>>>>>>>>> different osd/ different cluster 
>>>>>>>>> 
>>>>>>>>> before osd restart latency was between 2-5ms 
>>>>>>>>> after osd restart is around 1-1.5ms 
>>>>>>>>> 
>>>>>>>>> http://odisoweb1.odiso.net/cephperf2/bad.txt (2-5ms) 
>>>>>>>>> http://odisoweb1.odiso.net/cephperf2/ok.txt (1-1.5ms) 
>>>>>>>>> http://odisoweb1.odiso.net/cephperf2/diff.txt 
>>>>>>>>> 
>>>>>>>>> From what I see in diff, the biggest difference is in tcmalloc, but maybe I'm wrong. 
>>>>>>>>> (I'm using tcmalloc 2.5-2.2) 
>>>>>>>> currently i'm in the process of switching back from jemalloc to tcmalloc 
>>>>>>>> like suggested. This report makes me a little nervous about my change. 
>>>>>>>> 
>>>>>>>> Also i'm currently only monitoring latency for filestore osds. Which 
>>>>>>>> exact values out of the daemon do you use for bluestore? 
>>>>>>>> 
>>>>>>>> I would like to check if i see the same behaviour. 
>>>>>>>> 
>>>>>>>> Greets, 
>>>>>>>> Stefan 
>>>>>>>> 
>>>>>>>>> ----- Mail original ----- 
>>>>>>>>> De: "Sage Weil" <sage@xxxxxxxxxxxx> 
>>>>>>>>> À: "aderumier" <aderumier@xxxxxxxxx> 
>>>>>>>>> Cc: "ceph-users" <ceph-users@xxxxxxxxxxxxxx>, "ceph-devel" <ceph-devel@xxxxxxxxxxxxxxx> 
>>>>>>>>> Envoyé: Vendredi 25 Janvier 2019 10:49:02 
>>>>>>>>> Objet: Re: ceph osd commit latency increase over time, until restart 
>>>>>>>>> 
>>>>>>>>> Can you capture a perf top or perf record to see where teh CPU time is 
>>>>>>>>> going on one of the OSDs wth a high latency? 
>>>>>>>>> 
>>>>>>>>> Thanks! 
>>>>>>>>> sage 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Fri, 25 Jan 2019, Alexandre DERUMIER wrote: 
>>>>>>>>> 
>>>>>>>>>> Hi, 
>>>>>>>>>> 
>>>>>>>>>> I have a strange behaviour of my osd, on multiple clusters, 
>>>>>>>>>> 
>>>>>>>>>> All cluster are running mimic 13.2.1,bluestore, with ssd or nvme drivers, 
>>>>>>>>>> workload is rbd only, with qemu-kvm vms running with librbd + snapshot/rbd export-diff/snapshotdelete each day for backup 
>>>>>>>>>> 
>>>>>>>>>> When the osd are refreshly started, the commit latency is between 0,5-1ms. 
>>>>>>>>>> 
>>>>>>>>>> But overtime, this latency increase slowly (maybe around 1ms by day), until reaching crazy 
>>>>>>>>>> values like 20-200ms. 
>>>>>>>>>> 
>>>>>>>>>> Some example graphs: 
>>>>>>>>>> 
>>>>>>>>>> http://odisoweb1.odiso.net/osdlatency1.png 
>>>>>>>>>> http://odisoweb1.odiso.net/osdlatency2.png 
>>>>>>>>>> 
>>>>>>>>>> All osds have this behaviour, in all clusters. 
>>>>>>>>>> 
>>>>>>>>>> The latency of physical disks is ok. (Clusters are far to be full loaded) 
>>>>>>>>>> 
>>>>>>>>>> And if I restart the osd, the latency come back to 0,5-1ms. 
>>>>>>>>>> 
>>>>>>>>>> That's remember me old tcmalloc bug, but maybe could it be a bluestore memory bug ? 
>>>>>>>>>> 
>>>>>>>>>> Any Hints for counters/logs to check ? 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Regards, 
>>>>>>>>>> 
>>>>>>>>>> Alexandre 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> _______________________________________________ 
>>>>>>>>> ceph-users mailing list 
>>>>>>>>> ceph-users@xxxxxxxxxxxxxx 
>>>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
>>>>>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>>> _______________________________________________ 
>>> ceph-users mailing list 
>>> ceph-users@xxxxxxxxxxxxxx 
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
>>> 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com