Hi Boris, highly likely you've faced https://tracker.ceph.com/issues/47751 It's fixed in upcoming Nautilus release but v14.2.16 still lacks the fix. As a workaround you might want to switch back to bitmap or avl allocator. Thanks, Igor On 2/17/2021 12:36 PM, Boris Behrens wrote:
Hi, currently we experience osd daemon crashes and I can't pin the issue. I hope someone can help me with it. * We operate multiple cluster (440 SSD - 1PB, 36 SSD - 126TB, 40SSD 100TB, 84HDD - 680TB) * All clusters were updated around the same time (2021-02-03) * We restarted ALL ceph daemons (systemctl restart ceph.target) on 2021-02-11 after we added OOMScoreAdjust=-900 the all service files. now in our main cluster (440SSD with 1PB) the OSD daemons begin to crash: # ceph crash ls ID ENTITY NEW 2020-03-06_17:37:54.031675Z_0bbbb807-ff2f-46df-9508-58d319b89bd6 osd.397 2020-05-28_12:23:27.677741Z_061f2449-9a36-4747-a2f8-624e72cd1ad0 osd.410 2021-02-05_07:03:35.943384Z_dffab245-4788-4de2-a677-76b735d5fc01 osd.403 2021-02-15_15:41:27.934194Z_97b57f8f-58f2-4390-9d3e-993874e0e000 osd.395 2021-02-15_18:01:19.774879Z_18160e65-4659-451f-8aae-def2984f1f29 osd.178 2021-02-17_04:51:05.101052Z_9f04c6e8-d0c7-442c-9a38-33d5164d2a83 osd.384 osd.384 and osd.395 are on the same node, which had some memory issues we fixed 2021-02-16_12:00:00 osd.384 was marked as out for >24h when the daemon crashed, and there no more misplaced objects in the cluster. Here is the latest crash dump --- begin dump of recent events --- -9999> 2021-02-17 03:31:31.305 7fcf7e136700 1 do_command 'perf dump' 'result is 30067 bytes -9998> 2021-02-17 03:31:31.626 7fcf73be6700 5 prioritycache tune_memory target: 4294967296 mapped: 2882789376 unmapped: 956792832 heap: 3839582208 old mem: 2845415832 new mem: 2845415832 -9997> 2021-02-17 03:31:32.634 7fcf73be6700 5 prioritycache tune_memory target: 4294967296 mapped: 2882789376 unmapped: 956792832 heap: 3839582208 old mem: 2845415832 new mem: 2845415832 -9996> 2021-02-17 03:31:33.639 7fcf73be6700 5 prioritycache tune_memory target: 4294967296 mapped: 2882789376 unmapped: 956792832 heap: 3839582208 old mem: 2845415832 new mem: 2845415832 -9995> 2021-02-17 03:31:34.647 7fcf73be6700 5 prioritycache tune_memory target: 4294967296 mapped: 2882789376 unmapped: 956792832 heap: 3839582208 old mem: 2845415832 new mem: 2845415832 -9994> 2021-02-17 03:31:35.651 7fcf73be6700 5 prioritycache tune_memory target: 4294967296 mapped: 2882789376 unmapped: 956792832 heap: 3839582208 old mem: 2845415832 new mem: 2845415832 -9993> 2021-02-17 03:31:36.654 7fcf73be6700 5 prioritycache tune_memory target: 4294967296 mapped: 2882789376 unmapped: 956792832 heap: 3839582208 old mem: 2845415832 new mem: 2845415832 -9992> 2021-02-17 03:31:37.657 7fcf73be6700 5 prioritycache tune_memory target: 4294967296 mapped: 2882789376 unmapped: 956792832 heap: 3839582208 old mem: 2845415832 new mem: 2845415832 -9991> 2021-02-17 03:31:38.676 7fcf73be6700 5 prioritycache tune_memory target: 4294967296 mapped: 2882789376 unmapped: 956792832 heap: 3839582208 old mem: 2845415832 new mem: 2845415832 -9990> 2021-02-17 03:31:39.680 7fcf73be6700 5 prioritycache tune_memory target: 4294967296 mapped: 2882789376 unmapped: 956792832 heap: 3839582208 old mem: 2845415832 new mem: 2845415832 -9989> 2021-02-17 03:31:40.684 7fcf73be6700 5 prioritycache tune_memory target: 4294967296 mapped: 2882789376 unmapped: 956792832 heap: 3839582208 old mem: 2845415832 new mem: 2845415832 -9988> 2021-02-17 03:31:41.193 7fcf7e136700 1 do_command 'perf dump' ' -9987> 2021-02-17 03:31:41.193 7fcf7e136700 1 do_command 'perf dump' 'result is 30067 bytes <snip> -31> 2021-02-17 05:50:41.158 7fcf7e136700 1 do_command 'perf dump' ' -30> 2021-02-17 05:50:41.159 7fcf7e136700 1 do_command 'perf dump' 'result is 30070 bytes -29> 2021-02-17 05:50:41.804 7fcf73be6700 5 prioritycache tune_memory target: 4294967296 mapped: 2851831808 unmapped: 987750400 heap: 3839582208 old mem: 2845415832 new mem: 2845415832 -28> 2021-02-17 05:50:42.813 7fcf73be6700 5 prioritycache tune_memory target: 4294967296 mapped: 2851831808 unmapped: 987750400 heap: 3839582208 old mem: 2845415832 new mem: 2845415832 -27> 2021-02-17 05:50:43.820 7fcf73be6700 5 prioritycache tune_memory target: 4294967296 mapped: 2851405824 unmapped: 988176384 heap: 3839582208 old mem: 2845415832 new mem: 2845415832 -26> 2021-02-17 05:50:44.825 7fcf73be6700 5 prioritycache tune_memory target: 4294967296 mapped: 2851405824 unmapped: 988176384 heap: 3839582208 old mem: 2845415832 new mem: 2845415832 -25> 2021-02-17 05:50:45.831 7fcf73be6700 5 prioritycache tune_memory target: 4294967296 mapped: 2851405824 unmapped: 988176384 heap: 3839582208 old mem: 2845415832 new mem: 2845415832 -24> 2021-02-17 05:50:46.837 7fcf73be6700 5 prioritycache tune_memory target: 4294967296 mapped: 2851405824 unmapped: 988176384 heap: 3839582208 old mem: 2845415832 new mem: 2845415832 -23> 2021-02-17 05:50:47.840 7fcf73be6700 5 prioritycache tune_memory target: 4294967296 mapped: 2851405824 unmapped: 988176384 heap: 3839582208 old mem: 2845415832 new mem: 2845415832 -22> 2021-02-17 05:50:48.843 7fcf73be6700 5 prioritycache tune_memory target: 4294967296 mapped: 2851405824 unmapped: 988176384 heap: 3839582208 old mem: 2845415832 new mem: 2845415832 -21> 2021-02-17 05:50:49.847 7fcf73be6700 5 prioritycache tune_memory target: 4294967296 mapped: 2851405824 unmapped: 988176384 heap: 3839582208 old mem: 2845415832 new mem: 2845415832 -20> 2021-02-17 05:50:50.853 7fcf73be6700 5 prioritycache tune_memory target: 4294967296 mapped: 2851405824 unmapped: 988176384 heap: 3839582208 old mem: 2845415832 new mem: 2845415832 -19> 2021-02-17 05:50:51.524 7fcf7e136700 1 do_command 'perf dump' ' -18> 2021-02-17 05:50:51.525 7fcf7e136700 1 do_command 'perf dump' 'result is 30070 bytes -17> 2021-02-17 05:50:51.859 7fcf73be6700 5 prioritycache tune_memory target: 4294967296 mapped: 2851405824 unmapped: 988176384 heap: 3839582208 old mem: 2845415832 new mem: 2845415832 -16> 2021-02-17 05:50:52.862 7fcf73be6700 5 prioritycache tune_memory target: 4294967296 mapped: 2851405824 unmapped: 988176384 heap: 3839582208 old mem: 2845415832 new mem: 2845415832 -15> 2021-02-17 05:50:53.871 7fcf73be6700 5 prioritycache tune_memory target: 4294967296 mapped: 2851405824 unmapped: 988176384 heap: 3839582208 old mem: 2845415832 new mem: 2845415832 -14> 2021-02-17 05:50:54.875 7fcf73be6700 5 prioritycache tune_memory target: 4294967296 mapped: 2851405824 unmapped: 988176384 heap: 3839582208 old mem: 2845415832 new mem: 2845415832 -13> 2021-02-17 05:50:55.886 7fcf73be6700 5 prioritycache tune_memory target: 4294967296 mapped: 2851405824 unmapped: 988176384 heap: 3839582208 old mem: 2845415832 new mem: 2845415832 -12> 2021-02-17 05:50:56.891 7fcf73be6700 5 prioritycache tune_memory target: 4294967296 mapped: 2851405824 unmapped: 988176384 heap: 3839582208 old mem: 2845415832 new mem: 2845415832 -11> 2021-02-17 05:50:57.905 7fcf73be6700 5 prioritycache tune_memory target: 4294967296 mapped: 2851405824 unmapped: 988176384 heap: 3839582208 old mem: 2845415832 new mem: 2845415832 -10> 2021-02-17 05:50:58.911 7fcf73be6700 5 prioritycache tune_memory target: 4294967296 mapped: 2851405824 unmapped: 988176384 heap: 3839582208 old mem: 2845415832 new mem: 2845415832 -9> 2021-02-17 05:50:59.917 7fcf73be6700 5 prioritycache tune_memory target: 4294967296 mapped: 2851405824 unmapped: 988176384 heap: 3839582208 old mem: 2845415832 new mem: 2845415832 -8> 2021-02-17 05:51:00.929 7fcf73be6700 5 prioritycache tune_memory target: 4294967296 mapped: 2851405824 unmapped: 988176384 heap: 3839582208 old mem: 2845415832 new mem: 2845415832 -7> 2021-02-17 05:51:01.566 7fcf7e136700 1 do_command 'perf dump' ' -6> 2021-02-17 05:51:01.567 7fcf7e136700 1 do_command 'perf dump' 'result is 30070 bytes -5> 2021-02-17 05:51:01.935 7fcf73be6700 5 prioritycache tune_memory target: 4294967296 mapped: 2851405824 unmapped: 988176384 heap: 3839582208 old mem: 2845415832 new mem: 2845415832 -4> 2021-02-17 05:51:02.943 7fcf73be6700 5 prioritycache tune_memory target: 4294967296 mapped: 2851405824 unmapped: 988176384 heap: 3839582208 old mem: 2845415832 new mem: 2845415832 -3> 2021-02-17 05:51:03.949 7fcf73be6700 5 prioritycache tune_memory target: 4294967296 mapped: 2851102720 unmapped: 988479488 heap: 3839582208 old mem: 2845415832 new mem: 2845415832 -2> 2021-02-17 05:51:04.967 7fcf73be6700 5 prioritycache tune_memory target: 4294967296 mapped: 2851102720 unmapped: 988479488 heap: 3839582208 old mem: 2845415832 new mem: 2845415832 -1> 2021-02-17 05:51:05.091 7fcf743e7700 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.16/rpm/el7/BUILD/ceph-14.2.16/src/os/bluestore/fastbmap_allocator_impl.h: In function 'uint64_t AllocatorLevel02<T>::claim_free_to_right(uint64_t) [with L1 = AllocatorLevel01Loose; uint64_t = long unsigned int]' thread 7fcf743e7700 time 2021-02-17 05:51:04.998475 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.16/rpm/el7/BUILD/ceph-14.2.16/src/os/bluestore/fastbmap_allocator_impl.h: 572: FAILED ceph_assert(available >= allocated) ceph version 14.2.16 (762032d6f509d5e7ee7dc008d80fe9c87086603c) nautilus (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14a) [0x561c84cc2c7d] 2: (()+0x4d8e45) [0x561c84cc2e45] 3: (HybridAllocator::_add_to_tree(unsigned long, unsigned long)+0x49e) [0x561c853167de] 4: (AvlAllocator::_release(interval_set<unsigned long, std::map<unsigned long, unsigned long, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, unsigned long> > > > const&)+0x60) [0x561c85310b20] 5: (HybridAllocator::release(interval_set<unsigned long, std::map<unsigned long, unsigned long, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, unsigned long> > > > const&)+0x3a) [0x561c853143ca] 6: (BlueStore::_txc_release_alloc(BlueStore::TransContext*)+0x5f) [0x561c851ee83f] 7: (BlueStore::_txc_finish(BlueStore::TransContext*)+0x1be) [0x561c8522f4ae] 8: (BlueStore::_txc_state_proc(BlueStore::TransContext*)+0xaa) [0x561c8522fe9a] 9: (BlueStore::_kv_finalize_thread()+0x604) [0x561c85232ed4] 10: (BlueStore::KVFinalizeThread::entry()+0xd) [0x561c852625ed] 11: (()+0x7ea5) [0x7fcf840a2ea5] 12: (clone()+0x6d) [0x7fcf82f6596d] 0> 2021-02-17 05:51:05.145 7fcf743e7700 -1 *** Caught signal (Aborted) ** in thread 7fcf743e7700 thread_name:bstore_kv_final ceph version 14.2.16 (762032d6f509d5e7ee7dc008d80fe9c87086603c) nautilus (stable) 1: (()+0xf630) [0x7fcf840aa630] 2: (gsignal()+0x37) [0x7fcf82e9d387] 3: (abort()+0x148) [0x7fcf82e9ea78] 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x199) [0x561c84cc2ccc] 5: (()+0x4d8e45) [0x561c84cc2e45] 6: (HybridAllocator::_add_to_tree(unsigned long, unsigned long)+0x49e) [0x561c853167de] 7: (AvlAllocator::_release(interval_set<unsigned long, std::map<unsigned long, unsigned long, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, unsigned long> > > > const&)+0x60) [0x561c85310b20] 8: (HybridAllocator::release(interval_set<unsigned long, std::map<unsigned long, unsigned long, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, unsigned long> > > > const&)+0x3a) [0x561c853143ca] 9: (BlueStore::_txc_release_alloc(BlueStore::TransContext*)+0x5f) [0x561c851ee83f] 10: (BlueStore::_txc_finish(BlueStore::TransContext*)+0x1be) [0x561c8522f4ae] 11: (BlueStore::_txc_state_proc(BlueStore::TransContext*)+0xaa) [0x561c8522fe9a] 12: (BlueStore::_kv_finalize_thread()+0x604) [0x561c85232ed4] 13: (BlueStore::KVFinalizeThread::entry()+0xd) [0x561c852625ed] 14: (()+0x7ea5) [0x7fcf840a2ea5] 15: (clone()+0x6d) [0x7fcf82f6596d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
_______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx