mon sudden crash loop - pinned map

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,
our mon is acting up all of a sudden and dying in crash loop with the following:


2019-10-04 14:00:24.339583 lease_expire=0.000000 has v0 lc 4549352
    -3> 2019-10-04 14:00:24.335 7f6e5d461700  5 mon.km-fsn-1-dc4-m1-797678@0(leader).paxos(paxos active c 4548623..4549352) is_readable = 1 - now=2019-10-04 14:00:24.339620 lease_expire=0.000000 has v0 lc 4549352
    -2> 2019-10-04 14:00:24.343 7f6e5d461700 -1 mon.km-fsn-1-dc4-m1-797678@0(leader).osd e257349 get_full_from_pinned_map closest pinned map ver 252615 not available! error: (2) No such file or directory
    -1> 2019-10-04 14:00:24.343 7f6e5d461700 -1 /build/ceph-14.2.4/src/mon/OSDMonitor.cc: In function 'int OSDMonitor::get_full_from_pinned_map(version_t, ceph::bufferlist&)' thread 7f6e5d461700 time 2019-10-04 14:00:24.347580
/build/ceph-14.2.4/src/mon/OSDMonitor.cc: 3932: FAILED ceph_assert(err == 0)

 ceph version 14.2.4 (75f4de193b3ea58512f204623e6c5a16e6c1e1ba) nautilus (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x152) [0x7f6e68eb064e]
 2: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x7f6e68eb0829]
 3: (OSDMonitor::get_full_from_pinned_map(unsigned long, ceph::buffer::v14_2_0::list&)+0x80b) [0x72802b]
 4: (OSDMonitor::get_version_full(unsigned long, unsigned long, ceph::buffer::v14_2_0::list&)+0x3d2) [0x728c82]
 5: (OSDMonitor::encode_trim_extra(std::shared_ptr<MonitorDBStore::Transaction>, unsigned long)+0x8c) [0x717c3c]
 6: (PaxosService::maybe_trim()+0x473) [0x707443]
 7: (Monitor::tick()+0xa9) [0x5ecf39]
 8: (C_MonContext::finish(int)+0x39) [0x5c3f29]
 9: (Context::complete(int)+0x9) [0x6070d9]
 10: (SafeTimer::timer_thread()+0x190) [0x7f6e68f45580]
 11: (SafeTimerThread::entry()+0xd) [0x7f6e68f46e4d]
 12: (()+0x76ba) [0x7f6e67cab6ba]
 13: (clone()+0x6d) [0x7f6e674d441d]

     0> 2019-10-04 14:00:24.347 7f6e5d461700 -1 *** Caught signal (Aborted) **
 in thread 7f6e5d461700 thread_name:safe_timer

 ceph version 14.2.4 (75f4de193b3ea58512f204623e6c5a16e6c1e1ba) nautilus (stable)
 1: (()+0x11390) [0x7f6e67cb5390]
 2: (gsignal()+0x38) [0x7f6e67402428]
 3: (abort()+0x16a) [0x7f6e6740402a]
 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a3) [0x7f6e68eb069f]
 5: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x7f6e68eb0829]
 6: (OSDMonitor::get_full_from_pinned_map(unsigned long, ceph::buffer::v14_2_0::list&)+0x80b) [0x72802b]
 7: (OSDMonitor::get_version_full(unsigned long, unsigned long, ceph::buffer::v14_2_0::list&)+0x3d2) [0x728c82]
 8: (OSDMonitor::encode_trim_extra(std::shared_ptr<MonitorDBStore::Transaction>, unsigned long)+0x8c) [0x717c3c]
 9: (PaxosService::maybe_trim()+0x473) [0x707443]
 10: (Monitor::tick()+0xa9) [0x5ecf39]
 11: (C_MonContext::finish(int)+0x39) [0x5c3f29]
 12: (Context::complete(int)+0x9) [0x6070d9]
 13: (SafeTimer::timer_thread()+0x190) [0x7f6e68f45580]
 14: (SafeTimerThread::entry()+0xd) [0x7f6e68f46e4d]
 15: (()+0x76ba) [0x7f6e67cab6ba]
 16: (clone()+0x6d) [0x7f6e674d441d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.


This was running fine for 2months now, it's a crashed cluster that is in recovery.

Any suggestions?
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux