Hi all,
I have build testing cluster with 4 hosts, 1 SSD's and 11 HDD on each host.
Running ceph version 14.2.10 (b340acf629a010a74d90da5782a2c5fe0b54ac20) nautilus (stable) on Ubuntu.
Because we want to save small size object, I set bluestore_min_alloc_size 8192 (it is maybe important in this case)
I have filled it through rados gw with approx billion of small objects. After tests I changed min_alloc_size back and deleted rados pools
(to emtpy whole cluster) and I was waiting till cluster deletes data from OSD's, but that destabilized the cluster. I never reached health
OK. OSD's were killed in random order. I can start them back but they will again get out from cluster with..
```
-18> 2020-09-05 22:11:19.430 7f7a3ee40700 5 prioritycache tune_memory target: 3221225472 mapped: 2064359424 unmapped: 8708096 heap:
2073067520 old mem: 1932735282 new mem: 1932735282
-17> 2020-09-05 22:11:19.430 7f7a3ee40700 5 bluestore.MempoolThread(0x555a9d0efb70) _trim_shards cache_size: 1932735282 kv_alloc:
1644167168 kv_used: 1644135504 meta_alloc: 142606336 meta_used: 143595 data_alloc: 142606336 data_used: 98304
-16> 2020-09-05 22:11:20.434 7f7a3ee40700 5 prioritycache tune_memory target: 3221225472 mapped: 2064941056 unmapped: 8126464 heap:
2073067520 old mem: 1932735282 new mem: 1932735282
-15> 2020-09-05 22:11:21.434 7f7a3ee40700 5 prioritycache tune_memory target: 3221225472 mapped: 2064359424 unmapped: 8708096 heap:
2073067520 old mem: 1932735282 new mem: 1932735282
-14> 2020-09-05 22:11:22.258 7f7a2b81f700 5 osd.42 103257 heartbeat osd_stat(store_statfs(0x1ce18290000/0x2d08c0000/0x1d180000000, data
0x23143355/0x974a0000, compress 0x0/0x0/0x0, omap 0x1f11e, meta 0x2d08a0ee2), peers
[3,4,6,7,8,11,12,13,14,16,17,18,19,21,23,24,25,27,28,29,31,32,33,34,41,43] op hist [])
-13> 2020-09-05 22:11:22.438 7f7a3ee40700 5 prioritycache tune_memory target: 3221225472 mapped: 2064359424 unmapped: 8708096 heap:
2073067520 old mem: 1932735282 new mem: 1932735282
-12> 2020-09-05 22:11:23.442 7f7a3ee40700 5 prioritycache tune_memory target: 3221225472 mapped: 2064359424 unmapped: 8708096 heap:
2073067520 old mem: 1932735282 new mem: 1932735282
-11> 2020-09-05 22:11:24.442 7f7a3ee40700 5 prioritycache tune_memory target: 3221225472 mapped: 2064285696 unmapped: 8781824 heap:
2073067520 old mem: 1932735282 new mem: 1932735282
-10> 2020-09-05 22:11:24.442 7f7a3ee40700 5 bluestore.MempoolThread(0x555a9d0efb70) _trim_shards cache_size: 1932735282 kv_alloc:
1644167168 kv_used: 1644119840 meta_alloc: 142606336 meta_used: 143595 data_alloc: 142606336 data_used: 98304
-9> 2020-09-05 22:11:24.442 7f7a2e024700 0 bluestore(/var/lib/ceph/osd/ceph-42) log_latency_fn slow operation observed for
_collection_list, latency = 151.113s, lat = 2m cid =5.47_head start #5:e2000000::::0# end #MAX# max 2147483647
-8> 2020-09-05 22:11:24.446 7f7a2e024700 1 heartbeat_map reset_timeout 'OSD::osd_op_tp thread 0x7f7a2e024700' had timed out after 15
-7> 2020-09-05 22:11:24.446 7f7a2e024700 1 heartbeat_map reset_timeout 'OSD::osd_op_tp thread 0x7f7a2e024700' had suicide timed out
after 150
-6> 2020-09-05 22:11:24.446 7f7a4c2a4700 10 monclient: get_auth_request con 0x555b15d07680 auth_method 0
-5> 2020-09-05 22:11:24.446 7f7a3c494700 2 osd.42 103257 ms_handle_reset con 0x555b15963600 session 0x555a9f9d6d00
-4> 2020-09-05 22:11:24.446 7f7a3c494700 2 osd.42 103257 ms_handle_reset con 0x555b15961b00 session 0x555a9f9d7980
-3> 2020-09-05 22:11:24.446 7f7a3c494700 2 osd.42 103257 ms_handle_reset con 0x555b15963a80 session 0x555a9f9d6a80
-2> 2020-09-05 22:11:24.446 7f7a3c494700 2 osd.42 103257 ms_handle_reset con 0x555b15960480 session 0x555a9f9d6f80
-1> 2020-09-05 22:11:24.446 7f7a3c494700 3 osd.42 103257 handle_osd_map epochs [103258,103259], i have 103257, src has [83902,103259]
0> 2020-09-05 22:11:24.450 7f7a2e024700 -1 *** Caught signal (Aborted) **
```
I have approx 12 OSD's down with this error.
I decided to wipe problematic OSD's so I cannot debug it, but I'm curious what I did wrong (deleting pool with many small data?) or what to
do next time.
I did that before but not with bilion object and without bluestore_min_alloc_size change, and it worked without problems.
With regards
Jan Pekar
--
============
Ing. Jan Pekař
jan.pekar@xxxxxxxxx
----
Imatic | Jagellonská 14 | Praha 3 | 130 00
http://www.imatic.cz | +420326555326
============
--
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx