Quoting Martin Mlynář (nexus+ceph@xxxxxxxxxx):Do you think this could help? OSD does not even start, I'm getting a little lost how flushing caches could help.I might have mis-understood. I though the OSDs crashed when you set the config setting.According to trace I suspect something around processing config values.I've just set the same config setting on a test cluster and restarted an OSD without problem. So, not sure what is going on there. Gr. Stefan
I've compiled ceph-osd with debug symbols and got better backtrace:
-24> 2020-01-22 13:12:53.614 7f83ed064700 4
set_mon_vals no callback set
-23> 2020-01-22 13:12:53.614 7f83ee867700 10
monclient: discarding stray monitor message auth_reply(proto 2 0
(0) Success) v1
-22> 2020-01-22 13:12:53.614 7f83ed064700 10
set_mon_vals osd_crush_update_on_start = true
-21> 2020-01-22 13:12:53.614 7f83ed064700 10
set_mon_vals osd_max_backfills = 64
-20> 2020-01-22 13:12:53.614 7f83ed064700 10
set_mon_vals osd_memory_target = 2147483648
-19> 2020-01-22 13:12:53.614 7f83ed064700 10
set_mon_vals osd_recovery_max_active = 40
-18> 2020-01-22 13:12:53.614 7f83ed064700 10
set_mon_vals osd_recovery_max_single_start = 1000
-17> 2020-01-22 13:12:53.614 7f83ed064700 10
set_mon_vals osd_recovery_sleep_hdd = 0.000000
-16> 2020-01-22 13:12:53.614 7f83ed064700 10
set_mon_vals osd_recovery_sleep_hybrid = 0.000000
-15> 2020-01-22 13:12:53.627 7f83f0276c40 0 set
uid:gid to 64045:64045 (ceph:ceph)
-14> 2020-01-22 13:12:53.627 7f83f0276c40 0 ceph
version 14.2.6 (f0aa067ac7a02ee46ea48aa26c6e298b5ea272e9)
nautilus (stable), process ceph-osd, pid 1111622
-13> 2020-01-22 13:12:53.649 7f83f0276c40 0
pidfile_write: ignore empty --pid-file
-12> 2020-01-22 13:12:53.657 7f83f0276c40 5
asok(0x5580518fa000) init /var/run/ceph/ceph-osd.6.asok
-11> 2020-01-22 13:12:53.657 7f83f0276c40 5
asok(0x5580518fa000) bind_and_listen
/var/run/ceph/ceph-osd.6.asok
-10> 2020-01-22 13:12:53.657 7f83f0276c40 5
asok(0x5580518fa000) register_command 0 hook 0x558051872fc0
-9> 2020-01-22 13:12:53.657 7f83f0276c40 5
asok(0x5580518fa000) register_command version hook
0x558051872fc0
-8> 2020-01-22 13:12:53.657 7f83f0276c40 5
asok(0x5580518fa000) register_command git_version hook
0x558051872fc0
-7> 2020-01-22 13:12:53.657 7f83f0276c40 5
asok(0x5580518fa000) register_command help hook 0x558051874220
-6> 2020-01-22 13:12:53.657 7f83f0276c40 5
asok(0x5580518fa000) register_command get_command_descriptions
hook 0x558051874260
-5> 2020-01-22 13:12:53.657 7f83ed865700 5
asok(0x5580518fa000) entry start
-4> 2020-01-22 13:12:53.670 7f83f0276c40 5 object
store type is bluestore
-3> 2020-01-22 13:12:53.675 7f83f0276c40 1 bdev
create path /var/lib/ceph/osd/ceph-6/block type kernel
-2> 2020-01-22 13:12:53.675 7f83f0276c40 1
bdev(0x5580518f3f80 /var/lib/ceph/osd/ceph-6/block) open path
/var/lib/ceph/osd/ceph-6/block
-1> 2020-01-22 13:12:53.675 7f83f0276c40 1
bdev(0x5580518f3f80 /var/lib/ceph/osd/ceph-6/block) open size
3000588304384 (0x2baa1000000, 2.7 TiB) block_size 4096 (4 KiB)
rotational discard not supported
0> 2020-01-22 13:12:53.714 7f83f0276c40 -1 ***
Caught signal (Aborted) **
in thread 7f83f0276c40 thread_name:ceph-osd
ceph version 14.2.6
(f0aa067ac7a02ee46ea48aa26c6e298b5ea272e9) nautilus (stable)
1: (()+0x2c19654) [0x558045ec6654]
2: (()+0x12730) [0x7f83f0d1f730]
3: (gsignal()+0x10b) [0x7f83f08027bb]
4: (abort()+0x121) [0x7f83f07ed535]
5: (()+0x8c983) [0x7f83f0bb5983]
6: (()+0x928c6) [0x7f83f0bbb8c6]
7: (()+0x92901) [0x7f83f0bbb901]
8: (()+0x92b34) [0x7f83f0bbbb34]
9: (void
boost::throw_exception<boost::bad_get>(boost::bad_get
const&)+0x7b) [0x5580454d5430]
10: (Option::size_t&&
boost::relaxed_get<Option::size_t, boost::blank,
std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> >,
unsigned long, long, double, bool, entity_addr_t,
entity_addrvec_t, std::chrono::duration<long,
std::ratio<1l, 1l> >, Option::size_t,
uuid_d>(boost::variant<boost::blank,
std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> >,
unsigned long, long, double, bool, entity_addr_t,
entity_addrvec_t, std::chrono::duration<long,
std::ratio<1l, 1l> >, Option::size_t,
uuid_d>&&)+0x5b) [0x5580454d6223]
11: (Option::size_t&&
boost::strict_get<Option::size_t, boost::blank,
std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> >,
unsigned long, long, double, bool, entity_addr_t,
entity_addrvec_t, std::chrono::duration<long,
std::ratio<1l, 1l> >, Option::size_t,
uuid_d>(boost::variant<boost::blank,
std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> >,
unsigned long, long, double, bool, entity_addr_t,
entity_addrvec_t, std::chrono::duration<long,
std::ratio<1l, 1l> >, Option::size_t,
uuid_d>&&)+0x20) [0x5580454d4a39]
12: (Option::size_t&&
boost::get<Option::size_t, boost::blank,
std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> >,
unsigned long, long, double, bool, entity_addr_t,
entity_addrvec_t, std::chrono::duration<long,
std::ratio<1l, 1l> >, Option::size_t,
uuid_d>(boost::variant<boost::blank,
std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> >,
unsigned long, long, double, bool, entity_addr_t,
entity_addrvec_t, std::chrono::duration<long,
std::ratio<1l, 1l> >, Option::size_t,
uuid_d>&&)+0x20) [0x5580454d1ed7]
13: (Option::size_t const
md_config_t::get_val<Option::size_t>(ConfigValues
const&, std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> >
const&) const+0x48) [0x5580454ce882]
14: (Option::size_t const
ConfigProxy::get_val<Option::size_t>(std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> >
const&) const+0x58) [0x5580454cb9b8]
15: (BlueStore::_set_cache_sizes()+0x159)
[0x558045ce2213]
16: (BlueStore::_open_bdev(bool)+0x301) [0x558045ce6be3]
17:
(BlueStore::get_devices(std::set<std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> >,
std::less<std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> >
>, std::allocator<std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> >
> >*)+0xf9) [0x558045d0f16d]
18: (BlueStore::get_numa_node(int*, std::set<int,
std::less<int>, std::allocator<int> >*,
std::set<std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> >,
std::less<std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> >
>, std::allocator<std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> >
> >*)+0x79) [0x558045d0eb55]
19: (main()+0x3aae) [0x5580454c2460]
20: (__libc_start_main()+0xeb) [0x7f83f07ef09b]
21: (_start()+0x2a) [0x5580454bda2a]
NOTE: a copy of the executable, or `objdump -rdS
<executable>` is needed to interpret this.
And managed some gdb debugging (int BlueStore::_set_cache_sizes()):
(gdb) n
4116 cache_autotune_interval =
(gdb) n
4117
cct->_conf.get_val<double>("bluestore_cache_autotune_interval");
(gdb) p cache_autotune_interval
$3 = 5
(gdb) n
4118 osd_memory_target =
cct->_conf.get_val<Option::size_t>("osd_memory_target");
(gdb) s
std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char>
>::basic_string<std::allocator<char> >
(this=0x7fffffffc140, __s=0x555558d26c2f "osd_memory_target",
__a=...)
at /usr/include/c++/8/bits/basic_string.h:515
515 : _M_dataplus(_M_local_data(), __a)
(gdb) n
516 { _M_construct(__s, __s ? __s +
traits_type::length(__s) : __s+npos); }
(gdb)
terminate called after throwing an instance of
'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::bad_get>
>'
what(): boost::bad_get: failed value get using
boost::get
But there I'm stuck. GDBing c++ code is a really dark sorcery for
me.
Other get_vals look fine, maybe get_val<Option::size_t>
is the problem? It looks like trouble outside of ceph - what
system are you testing on? This is debian with official debian
build from buster-backports. Maybe some of debian's patches?
-- Martin Mlynář
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com