Hi,
our mon is acting up all of a sudden and dying in crash loop with the following:
2019-10-04 14:00:24.339583 lease_expire=0.000000 has v0 lc 4549352
-3> 2019-10-04 14:00:24.335 7f6e5d461700 5 mon.km-fsn-1-dc4-m1-797678@0(leader).paxos(paxos active c 4548623..4549352) is_readable = 1 - now=2019-10-04 14:00:24.339620 lease_expire=0.000000 has v0 lc 4549352
-2> 2019-10-04 14:00:24.343 7f6e5d461700 -1 mon.km-fsn-1-dc4-m1-797678@0(leader).osd e257349 get_full_from_pinned_map closest pinned map ver 252615 not available! error: (2) No such file or directory
-1> 2019-10-04 14:00:24.343 7f6e5d461700 -1 /build/ceph-14.2.4/src/mon/OSDMonitor.cc: In function 'int OSDMonitor::get_full_from_pinned_map(version_t, ceph::bufferlist&)' thread 7f6e5d461700 time 2019-10-04 14:00:24.347580
/build/ceph-14.2.4/src/mon/OSDMonitor.cc: 3932: FAILED ceph_assert(err == 0)
ceph version 14.2.4 (75f4de193b3ea58512f204623e6c5a16e6c1e1ba) nautilus (stable)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x152) [0x7f6e68eb064e]
2: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x7f6e68eb0829]
3: (OSDMonitor::get_full_from_pinned_map(unsigned long, ceph::buffer::v14_2_0::list&)+0x80b) [0x72802b]
4: (OSDMonitor::get_version_full(unsigned long, unsigned long, ceph::buffer::v14_2_0::list&)+0x3d2) [0x728c82]
5: (OSDMonitor::encode_trim_extra(std::shared_ptr<MonitorDBStore::Transaction>, unsigned long)+0x8c) [0x717c3c]
6: (PaxosService::maybe_trim()+0x473) [0x707443]
7: (Monitor::tick()+0xa9) [0x5ecf39]
8: (C_MonContext::finish(int)+0x39) [0x5c3f29]
9: (Context::complete(int)+0x9) [0x6070d9]
10: (SafeTimer::timer_thread()+0x190) [0x7f6e68f45580]
11: (SafeTimerThread::entry()+0xd) [0x7f6e68f46e4d]
12: (()+0x76ba) [0x7f6e67cab6ba]
13: (clone()+0x6d) [0x7f6e674d441d]
0> 2019-10-04 14:00:24.347 7f6e5d461700 -1 *** Caught signal (Aborted) **
in thread 7f6e5d461700 thread_name:safe_timer
ceph version 14.2.4 (75f4de193b3ea58512f204623e6c5a16e6c1e1ba) nautilus (stable)
1: (()+0x11390) [0x7f6e67cb5390]
2: (gsignal()+0x38) [0x7f6e67402428]
3: (abort()+0x16a) [0x7f6e6740402a]
4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a3) [0x7f6e68eb069f]
5: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x7f6e68eb0829]
6: (OSDMonitor::get_full_from_pinned_map(unsigned long, ceph::buffer::v14_2_0::list&)+0x80b) [0x72802b]
7: (OSDMonitor::get_version_full(unsigned long, unsigned long, ceph::buffer::v14_2_0::list&)+0x3d2) [0x728c82]
8: (OSDMonitor::encode_trim_extra(std::shared_ptr<MonitorDBStore::Transaction>, unsigned long)+0x8c) [0x717c3c]
9: (PaxosService::maybe_trim()+0x473) [0x707443]
10: (Monitor::tick()+0xa9) [0x5ecf39]
11: (C_MonContext::finish(int)+0x39) [0x5c3f29]
12: (Context::complete(int)+0x9) [0x6070d9]
13: (SafeTimer::timer_thread()+0x190) [0x7f6e68f45580]
14: (SafeTimerThread::entry()+0xd) [0x7f6e68f46e4d]
15: (()+0x76ba) [0x7f6e67cab6ba]
16: (clone()+0x6d) [0x7f6e674d441d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
-3> 2019-10-04 14:00:24.335 7f6e5d461700 5 mon.km-fsn-1-dc4-m1-797678@0(leader).paxos(paxos active c 4548623..4549352) is_readable = 1 - now=2019-10-04 14:00:24.339620 lease_expire=0.000000 has v0 lc 4549352
-2> 2019-10-04 14:00:24.343 7f6e5d461700 -1 mon.km-fsn-1-dc4-m1-797678@0(leader).osd e257349 get_full_from_pinned_map closest pinned map ver 252615 not available! error: (2) No such file or directory
-1> 2019-10-04 14:00:24.343 7f6e5d461700 -1 /build/ceph-14.2.4/src/mon/OSDMonitor.cc: In function 'int OSDMonitor::get_full_from_pinned_map(version_t, ceph::bufferlist&)' thread 7f6e5d461700 time 2019-10-04 14:00:24.347580
/build/ceph-14.2.4/src/mon/OSDMonitor.cc: 3932: FAILED ceph_assert(err == 0)
ceph version 14.2.4 (75f4de193b3ea58512f204623e6c5a16e6c1e1ba) nautilus (stable)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x152) [0x7f6e68eb064e]
2: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x7f6e68eb0829]
3: (OSDMonitor::get_full_from_pinned_map(unsigned long, ceph::buffer::v14_2_0::list&)+0x80b) [0x72802b]
4: (OSDMonitor::get_version_full(unsigned long, unsigned long, ceph::buffer::v14_2_0::list&)+0x3d2) [0x728c82]
5: (OSDMonitor::encode_trim_extra(std::shared_ptr<MonitorDBStore::Transaction>, unsigned long)+0x8c) [0x717c3c]
6: (PaxosService::maybe_trim()+0x473) [0x707443]
7: (Monitor::tick()+0xa9) [0x5ecf39]
8: (C_MonContext::finish(int)+0x39) [0x5c3f29]
9: (Context::complete(int)+0x9) [0x6070d9]
10: (SafeTimer::timer_thread()+0x190) [0x7f6e68f45580]
11: (SafeTimerThread::entry()+0xd) [0x7f6e68f46e4d]
12: (()+0x76ba) [0x7f6e67cab6ba]
13: (clone()+0x6d) [0x7f6e674d441d]
0> 2019-10-04 14:00:24.347 7f6e5d461700 -1 *** Caught signal (Aborted) **
in thread 7f6e5d461700 thread_name:safe_timer
ceph version 14.2.4 (75f4de193b3ea58512f204623e6c5a16e6c1e1ba) nautilus (stable)
1: (()+0x11390) [0x7f6e67cb5390]
2: (gsignal()+0x38) [0x7f6e67402428]
3: (abort()+0x16a) [0x7f6e6740402a]
4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a3) [0x7f6e68eb069f]
5: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x7f6e68eb0829]
6: (OSDMonitor::get_full_from_pinned_map(unsigned long, ceph::buffer::v14_2_0::list&)+0x80b) [0x72802b]
7: (OSDMonitor::get_version_full(unsigned long, unsigned long, ceph::buffer::v14_2_0::list&)+0x3d2) [0x728c82]
8: (OSDMonitor::encode_trim_extra(std::shared_ptr<MonitorDBStore::Transaction>, unsigned long)+0x8c) [0x717c3c]
9: (PaxosService::maybe_trim()+0x473) [0x707443]
10: (Monitor::tick()+0xa9) [0x5ecf39]
11: (C_MonContext::finish(int)+0x39) [0x5c3f29]
12: (Context::complete(int)+0x9) [0x6070d9]
13: (SafeTimer::timer_thread()+0x190) [0x7f6e68f45580]
14: (SafeTimerThread::entry()+0xd) [0x7f6e68f46e4d]
15: (()+0x76ba) [0x7f6e67cab6ba]
16: (clone()+0x6d) [0x7f6e674d441d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
This was running fine for 2months now, it's a crashed cluster that is in recovery.
Any suggestions?
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com