Hi Igor, unfortunately same result: # ceph config dump WHO MASK LEVEL OPTION VALUE RO osd basic osd_memory_target 2147483648 # /usr/bin/ceph-osd -d --cluster ceph --id 0 --setuser ceph --setgroup ceph .... 0> 2020-01-23 10:48:04.436 7fc61b5b5c80 -1 *** Caught signal (Aborted) ** in thread 7fc61b5b5c80 thread_name:ceph-osd ceph version 14.2.6 (f0aa067ac7a02ee46ea48aa26c6e298b5ea272e9) nautilus (stable) 1: (()+0x12730) [0x7fc61c05d730] 2: (gsignal()+0x10b) [0x7fc61bb417bb] 3: (abort()+0x121) [0x7fc61bb2c535] 4: (()+0x8c983) [0x7fc61bef4983] 5: (()+0x928c6) [0x7fc61befa8c6] 6: (()+0x92901) [0x7fc61befa901] 7: (()+0x92b34) [0x7fc61befab34] 8: (()+0x5a3f53) [0x55ecdabb2f53] 9: (Option::size_t const md_config_t::get_val<Option::size_t>(ConfigValues const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const+0x81) [0x55ecdabb8c91] 10: (BlueStore::_set_cache_sizes()+0x15a) [0x55ecdb033d8a] 11: (BlueStore::_open_bdev(bool)+0x173) [0x55ecdb036b23] 12: (BlueStore::get_devices(std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >*)+0xef) [0x55ecdb09d7ef] 13: (BlueStore::get_numa_node(int*, std::set<int, std::less<int>, std::allocator<int> >*, std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >*)+0x7b) [0x55ecdb04571b] 14: (main()+0x2870) [0x55ecdab80440] 15: (__libc_start_main()+0xeb) [0x7fc61bb2e09b] 16: (_start()+0x2a) [0x55ecdabb2c6a] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. .... Best Regards, Martin Dne 22. 01. 20 v 16:33 Igor Fedotov napsal(a): > > Hi Martin, > > looks like a bug to me. > > You might want to remove all custom settings from config database and > try to set osd-memory-target only. > > Would it help? > > > Thanks, > > Igor > > On 1/22/2020 3:43 PM, Martin Mlynář wrote: >> >> >> Dne 21. 01. 20 v 21:12 Stefan Kooman napsal(a): >>> Quoting Martin Mlynář (nexus+ceph@xxxxxxxxxx): >>> >>>> Do you think this could help? OSD does not even start, I'm getting a little >>>> lost how flushing caches could help. >>> I might have mis-understood. I though the OSDs crashed when you set the >>> config setting. >>> >>>> According to trace I suspect something around processing config values. >>> I've just set the same config setting on a test cluster and restarted an >>> OSD without problem. So, not sure what is going on there. >>> >>> Gr. Stefan >> >> I've compiled ceph-osd with debug symbols and got better backtrace: >> >> -24> 2020-01-22 13:12:53.614 7f83ed064700 4 set_mon_vals no >> callback set >> -23> 2020-01-22 13:12:53.614 7f83ee867700 10 monclient: discarding >> stray monitor message auth_reply(proto 2 0 (0) Success) v1 >> -22> 2020-01-22 13:12:53.614 7f83ed064700 10 set_mon_vals >> osd_crush_update_on_start = true >> -21> 2020-01-22 13:12:53.614 7f83ed064700 10 set_mon_vals >> osd_max_backfills = 64 >> -20> 2020-01-22 13:12:53.614 7f83ed064700 10 set_mon_vals >> osd_memory_target = 2147483648 >> -19> 2020-01-22 13:12:53.614 7f83ed064700 10 set_mon_vals >> osd_recovery_max_active = 40 >> -18> 2020-01-22 13:12:53.614 7f83ed064700 10 set_mon_vals >> osd_recovery_max_single_start = 1000 >> -17> 2020-01-22 13:12:53.614 7f83ed064700 10 set_mon_vals >> osd_recovery_sleep_hdd = 0.000000 >> -16> 2020-01-22 13:12:53.614 7f83ed064700 10 set_mon_vals >> osd_recovery_sleep_hybrid = 0.000000 >> -15> 2020-01-22 13:12:53.627 7f83f0276c40 0 set uid:gid to >> 64045:64045 (ceph:ceph) >> -14> 2020-01-22 13:12:53.627 7f83f0276c40 0 ceph version 14.2.6 >> (f0aa067ac7a02ee46ea48aa26c6e298b5ea272e9) nautilus (stable), process >> ceph-osd, pid 1111622 >> -13> 2020-01-22 13:12:53.649 7f83f0276c40 0 pidfile_write: ignore >> empty --pid-file >> -12> 2020-01-22 13:12:53.657 7f83f0276c40 5 asok(0x5580518fa000) >> init /var/run/ceph/ceph-osd.6.asok >> -11> 2020-01-22 13:12:53.657 7f83f0276c40 5 asok(0x5580518fa000) >> bind_and_listen /var/run/ceph/ceph-osd.6.asok >> -10> 2020-01-22 13:12:53.657 7f83f0276c40 5 asok(0x5580518fa000) >> register_command 0 hook 0x558051872fc0 >> -9> 2020-01-22 13:12:53.657 7f83f0276c40 5 asok(0x5580518fa000) >> register_command version hook 0x558051872fc0 >> -8> 2020-01-22 13:12:53.657 7f83f0276c40 5 asok(0x5580518fa000) >> register_command git_version hook 0x558051872fc0 >> -7> 2020-01-22 13:12:53.657 7f83f0276c40 5 asok(0x5580518fa000) >> register_command help hook 0x558051874220 >> -6> 2020-01-22 13:12:53.657 7f83f0276c40 5 asok(0x5580518fa000) >> register_command get_command_descriptions hook 0x558051874260 >> -5> 2020-01-22 13:12:53.657 7f83ed865700 5 asok(0x5580518fa000) >> entry start >> -4> 2020-01-22 13:12:53.670 7f83f0276c40 5 object store type is >> bluestore >> -3> 2020-01-22 13:12:53.675 7f83f0276c40 1 bdev create path >> /var/lib/ceph/osd/ceph-6/block type kernel >> -2> 2020-01-22 13:12:53.675 7f83f0276c40 1 bdev(0x5580518f3f80 >> /var/lib/ceph/osd/ceph-6/block) open path /var/lib/ceph/osd/ceph-6/block >> -1> 2020-01-22 13:12:53.675 7f83f0276c40 1 bdev(0x5580518f3f80 >> /var/lib/ceph/osd/ceph-6/block) open size 3000588304384 >> (0x2baa1000000, 2.7 TiB) block_size 4096 (4 KiB) rotational discard >> not supported >> 0> 2020-01-22 13:12:53.714 7f83f0276c40 -1 *** Caught signal >> (Aborted) ** >> in thread 7f83f0276c40 thread_name:ceph-osd >> >> ceph version 14.2.6 (f0aa067ac7a02ee46ea48aa26c6e298b5ea272e9) >> nautilus (stable) >> 1: (()+0x2c19654) [0x558045ec6654] >> 2: (()+0x12730) [0x7f83f0d1f730] >> 3: (gsignal()+0x10b) [0x7f83f08027bb] >> 4: (abort()+0x121) [0x7f83f07ed535] >> 5: (()+0x8c983) [0x7f83f0bb5983] >> 6: (()+0x928c6) [0x7f83f0bbb8c6] >> 7: (()+0x92901) [0x7f83f0bbb901] >> 8: (()+0x92b34) [0x7f83f0bbbb34] >> * 9: (void boost::throw_exception<boost::bad_get>(boost::bad_get >> const&)+0x7b) [0x5580454d5430]* >> * 10: (Option::size_t&& boost::relaxed_get<Option::size_t, >> boost::blank, std::__cxx11::basic_string<char, >> std::char_traits<char>, std::allocator<char> >, unsigned long, long, >> double, bool, entity_addr_t, entity_addrvec_t, >> std::chrono::duration<long, std::ratio<1l, 1l> >, Option::size_t, >> uuid_d>(boost::variant<boost::blank, std::__cxx11::basic_string<char, >> std::char_traits<char>, std::allocator<char> >, unsigned long, long, >> double, bool, entity_addr_t, entity_addrvec_t, >> std::chrono::duration<long, std::ratio<1l, 1l> >, Option::size_t, >> uuid_d>&&)+0x5b) [0x5580454d6223]* >> 11: (Option::size_t&& boost::strict_get<Option::size_t, >> boost::blank, std::__cxx11::basic_string<char, >> std::char_traits<char>, std::allocator<char> >, unsigned long, long, >> double, bool, entity_addr_t, entity_addrvec_t, >> std::chrono::duration<long, std::ratio<1l, 1l> >, Option::size_t, >> uuid_d>(boost::variant<boost::blank, std::__cxx11::basic_string<char, >> std::char_traits<char>, std::allocator<char> >, unsigned long, long, >> double, bool, entity_addr_t, entity_addrvec_t, >> std::chrono::duration<long, std::ratio<1l, 1l> >, Option::size_t, >> uuid_d>&&)+0x20) [0x5580454d4a39] >> 12: (Option::size_t&& boost::get<Option::size_t, boost::blank, >> std::__cxx11::basic_string<char, std::char_traits<char>, >> std::allocator<char> >, unsigned long, long, double, bool, >> entity_addr_t, entity_addrvec_t, std::chrono::duration<long, >> std::ratio<1l, 1l> >, Option::size_t, >> uuid_d>(boost::variant<boost::blank, std::__cxx11::basic_string<char, >> std::char_traits<char>, std::allocator<char> >, unsigned long, long, >> double, bool, entity_addr_t, entity_addrvec_t, >> std::chrono::duration<long, std::ratio<1l, 1l> >, Option::size_t, >> uuid_d>&&)+0x20) [0x5580454d1ed7] >> 13: (Option::size_t const >> md_config_t::get_val<Option::size_t>(ConfigValues const&, >> std::__cxx11::basic_string<char, std::char_traits<char>, >> std::allocator<char> > const&) const+0x48) [0x5580454ce882] >> 14: (Option::size_t const >> ConfigProxy::get_val<Option::size_t>(std::__cxx11::basic_string<char, >> std::char_traits<char>, std::allocator<char> > const&) const+0x58) >> [0x5580454cb9b8] >> 15: (BlueStore::_set_cache_sizes()+0x159) [0x558045ce2213] >> 16: (BlueStore::_open_bdev(bool)+0x301) [0x558045ce6be3] >> 17: >> (BlueStore::get_devices(std::set<std::__cxx11::basic_string<char, >> std::char_traits<char>, std::allocator<char> >, >> std::less<std::__cxx11::basic_string<char, std::char_traits<char>, >> std::allocator<char> > >, >> std::allocator<std::__cxx11::basic_string<char, >> std::char_traits<char>, std::allocator<char> > > >*)+0xf9) >> [0x558045d0f16d] >> 18: (BlueStore::get_numa_node(int*, std::set<int, std::less<int>, >> std::allocator<int> >*, std::set<std::__cxx11::basic_string<char, >> std::char_traits<char>, std::allocator<char> >, >> std::less<std::__cxx11::basic_string<char, std::char_traits<char>, >> std::allocator<char> > >, >> std::allocator<std::__cxx11::basic_string<char, >> std::char_traits<char>, std::allocator<char> > > >*)+0x79) >> [0x558045d0eb55] >> 19: (main()+0x3aae) [0x5580454c2460] >> 20: (__libc_start_main()+0xeb) [0x7f83f07ef09b] >> 21: (_start()+0x2a) [0x5580454bda2a] >> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >> needed to interpret this. >> >> And managed some gdb debugging (int BlueStore::_set_cache_sizes()): >> >> (gdb) n >> 4116 cache_autotune_interval = >> (gdb) n >> 4117 >> cct->_conf.get_val<double>("bluestore_cache_autotune_interval"); >> (gdb) p cache_autotune_interval >> $3 = 5 >> (gdb) n >> 4118 osd_memory_target = >> cct->_conf.get_val<Option::size_t>("osd_memory_target"); >> (gdb) s >> std::__cxx11::basic_string<char, std::char_traits<char>, >> std::allocator<char> >::basic_string<std::allocator<char> > >> (this=0x7fffffffc140, __s=0x555558d26c2f "osd_memory_target", __a=...) >> at /usr/include/c++/8/bits/basic_string.h:515 >> 515 : _M_dataplus(_M_local_data(), __a) >> (gdb) n >> 516 { _M_construct(__s, __s ? __s + traits_type::length(__s) >> : __s+npos); } >> (gdb) >> terminate called after throwing an instance of >> 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::bad_get> >> >' >> what(): boost::bad_get: failed value get using boost::get >> >> But there I'm stuck. GDBing c++ code is a really dark sorcery for me. >> >> Other get_vals look fine, maybe get_val<*Option::size_t*> is the >> problem? It looks like trouble outside of ceph - what system are you >> testing on? This is debian with official debian build from >> buster-backports. Maybe some of debian's patches? >> >> -- >> Martin Mlynář >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx