Re: Ceph Pacific mon is not starting after host reboot

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks for the confirmation, Greg! I‘ll try with a newer release then. That’s why we’re testing, isn’t it? ;-) Then the OPs issue is probably not resolved yet since he didn’t mention a stretch cluster. Sorry for high-jacking the thread.

Zitat von Gregory Farnum <gfarnum@xxxxxxxxxx>:

On Tue, May 25, 2021 at 7:17 AM Eugen Block <eblock@xxxxxx> wrote:
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.4/rpm/el8/BUILD/ceph-16.2.4/src/osd/OSDMap.cc: In function 'void OSDMap::Incremental::encode(ceph::buffer::v15_2_0::list&, uint64_t) const' thread 7ff3b1aa1700 time
2021-05-25T13:44:26.732857+0000
2021-05-25T15:44:26.989087+02:00 pacific1 conmon[5132]:
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.4/rpm/el8/BUILD/ceph-16.2.4/src/osd/OSDMap.cc: 658: FAILED ceph_assert(target_v >=
9)
2021-05-25T15:44:26.989163+02:00 pacific1 conmon[5132]:
2021-05-25T15:44:26.989239+02:00 pacific1 conmon[5132]:  ceph version
16.2.4 (3cbe25cde3cfa028984618ad32de9edc4c1eaed0) pacific (stable)
2021-05-25T15:44:26.989314+02:00 pacific1 conmon[5132]:  1:
(ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x158) [0x7ff3bf61a59c]
2021-05-25T15:44:26.989388+02:00 pacific1 conmon[5132]:  2:
/usr/lib64/ceph/libceph-common.so.2(+0x2767b6) [0x7ff3bf61a7b6]
2021-05-25T15:44:26.989489+02:00 pacific1 conmon[5132]:  3:
(OSDMap::Incremental::encode(ceph::buffer::v15_2_0::list&, unsigned
long) const+0x539) [0x7ff3bfa529f9]
2021-05-25T15:44:26.989560+02:00 pacific1 conmon[5132]:  4:
(OSDMonitor::reencode_incremental_map(ceph::buffer::v15_2_0::list&,
unsigned long)+0x1c9) [0x55e377b36df9]
2021-05-25T15:44:26.989627+02:00 pacific1 conmon[5132]:  5:
(OSDMonitor::get_version(unsigned long, unsigned long,
ceph::buffer::v15_2_0::list&)+0x1f4) [0x55e377b37234]
2021-05-25T15:44:26.989693+02:00 pacific1 conmon[5132]:  6:
(OSDMonitor::build_incremental(unsigned int, unsigned int, unsigned
long)+0x301) [0x55e377b3a3c1]
2021-05-25T15:44:26.989759+02:00 pacific1 conmon[5132]:  7:
(OSDMonitor::send_incremental(unsigned int, MonSession*, bool,
boost::intrusive_ptr<MonOpRequest>)+0x104) [0x55e377b3b094]
2021-05-25T15:44:26.989825+02:00 pacific1 conmon[5132]:  8:
(OSDMonitor::check_osdmap_sub(Subscription*)+0x72) [0x55e377b42792]
2021-05-25T15:44:26.989891+02:00 pacific1 conmon[5132]:  9:
(Monitor::handle_subscribe(boost::intrusive_ptr<MonOpRequest>)+0xe82)
[0x55e3779da402]
2021-05-25T15:44:26.989967+02:00 pacific1 conmon[5132]:  10:
(Monitor::dispatch_op(boost::intrusive_ptr<MonOpRequest>)+0x78d)
[0x55e377a002ed]
2021-05-25T15:44:26.990046+02:00 pacific1 conmon[5132]:  11:
(Monitor::_ms_dispatch(Message*)+0x670) [0x55e377a01910]
2021-05-25T15:44:26.990113+02:00 pacific1 conmon[5132]:  12:
(Dispatcher::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x5c)
[0x55e377a2ffdc]
2021-05-25T15:44:26.990179+02:00 pacific1 conmon[5132]:  13:
(DispatchQueue::entry()+0x126a) [0x7ff3bf854b1a]
2021-05-25T15:44:26.990255+02:00 pacific1 conmon[5132]:  14:
(DispatchQueue::DispatchThread::entry()+0x11) [0x7ff3bf904b71]
2021-05-25T15:44:26.990330+02:00 pacific1 conmon[5132]:  15:
/lib64/libpthread.so.0(+0x814a) [0x7ff3bd10a14a]
2021-05-25T15:44:26.990420+02:00 pacific1 conmon[5132]:  16: clone()
2021-05-25T15:44:26.990497+02:00 pacific1 conmon[5132]:
2021-05-25T15:44:26.990573+02:00 pacific1 conmon[5132]: debug      0>
2021-05-25T13:44:26.742+0000 7ff3b1aa1700 -1 *** Caught signal
(Aborted) **
2021-05-25T15:44:26.990648+02:00 pacific1 conmon[5132]:  in thread
7ff3b1aa1700 thread_name:ms_dispatch
2021-05-25T15:44:26.990723+02:00 pacific1 conmon[5132]:
2021-05-25T15:44:26.990806+02:00 pacific1 conmon[5132]:  ceph version
16.2.4 (3cbe25cde3cfa028984618ad32de9edc4c1eaed0) pacific (stable)
2021-05-25T15:44:26.990883+02:00 pacific1 conmon[5132]:  1:
/lib64/libpthread.so.0(+0x12b20) [0x7ff3bd114b20]
2021-05-25T15:44:26.990958+02:00 pacific1 conmon[5132]:  2: gsignal()
2021-05-25T15:44:26.991034+02:00 pacific1 conmon[5132]:  3: abort()
2021-05-25T15:44:26.991110+02:00 pacific1 conmon[5132]:  4:
(ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x1a9) [0x7ff3bf61a5ed]
2021-05-25T15:44:26.991176+02:00 pacific1 conmon[5132]:  5:
/usr/lib64/ceph/libceph-common.so.2(+0x2767b6) [0x7ff3bf61a7b6]
2021-05-25T15:44:26.991251+02:00 pacific1 conmon[5132]:  6:
(OSDMap::Incremental::encode(ceph::buffer::v15_2_0::list&, unsigned
long) const+0x539) [0x7ff3bfa529f9]
2021-05-25T15:44:26.991326+02:00 pacific1 conmon[5132]:  7:
(OSDMonitor::reencode_incremental_map(ceph::buffer::v15_2_0::list&,
unsigned long)+0x1c9) [0x55e377b36df9]
2021-05-25T15:44:26.991393+02:00 pacific1 conmon[5132]:  8:
(OSDMonitor::get_version(unsigned long, unsigned long,
ceph::buffer::v15_2_0::list&)+0x1f4) [0x55e377b37234]
2021-05-25T15:44:26.991460+02:00 pacific1 conmon[5132]:  9:
(OSDMonitor::build_incremental(unsigned int, unsigned int, unsigned
long)+0x301) [0x55e377b3a3c1]
2021-05-25T15:44:26.991557+02:00 pacific1 conmon[5132]:  10:
(OSDMonitor::send_incremental(unsigned int, MonSession*, bool,
boost::intrusive_ptr<MonOpRequest>)+0x104) [0x55e377b3b094]
2021-05-25T15:44:26.991628+02:00 pacific1 conmon[5132]:  11:
(OSDMonitor::check_osdmap_sub(Subscription*)+0x72) [0x55e377b42792]
2021-05-25T15:44:26.991695+02:00 pacific1 conmon[5132]:  12:
(Monitor::handle_subscribe(boost::intrusive_ptr<MonOpRequest>)+0xe82)
[0x55e3779da402]
2021-05-25T15:44:26.991761+02:00 pacific1 conmon[5132]:  13:
(Monitor::dispatch_op(boost::intrusive_ptr<MonOpRequest>)+0x78d)
[0x55e377a002ed]
2021-05-25T15:44:26.991827+02:00 pacific1 conmon[5132]:  14:
(Monitor::_ms_dispatch(Message*)+0x670) [0x55e377a01910]
2021-05-25T15:44:26.991893+02:00 pacific1 conmon[5132]:  15:
(Dispatcher::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x5c)
[0x55e377a2ffdc]
2021-05-25T15:44:26.991959+02:00 pacific1 conmon[5132]:  16:
(DispatchQueue::entry()+0x126a) [0x7ff3bf854b1a]
2021-05-25T15:44:26.992025+02:00 pacific1 conmon[5132]:  17:
(DispatchQueue::DispatchThread::entry()+0x11) [0x7ff3bf904b71]
2021-05-25T15:44:26.992091+02:00 pacific1 conmon[5132]:  18:
/lib64/libpthread.so.0(+0x814a) [0x7ff3bd10a14a]
2021-05-25T15:44:26.992156+02:00 pacific1 conmon[5132]:  19: clone()
---snip---


I can't tell if this is due to the limited resources in my virtual
cluster but I figured since the non-stretch mode seems to work as
expected this could be a problem with the stretch mode. I can provide
more information if required, just let me know what I can do.

This crash is an issue with overly-zealous safety checks breaking when
encoding the osdmap for kernel clients in stretch mode, and is fixed
for the next pacific point release[1]. Sorry for the trouble.
-Greg

[1]: https://github.com/ceph/ceph/pull/40484/commits/9453e34ea7c480ddfea8363cbad76cf1e2d46625


_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux