Re: Ceph Pacific mon is not starting after host reboot

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, May 25, 2021 at 7:17 AM Eugen Block <eblock@xxxxxx> wrote:
> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.4/rpm/el8/BUILD/ceph-16.2.4/src/osd/OSDMap.cc: In function 'void OSDMap::Incremental::encode(ceph::buffer::v15_2_0::list&, uint64_t) const' thread 7ff3b1aa1700 time
> 2021-05-25T13:44:26.732857+0000
> 2021-05-25T15:44:26.989087+02:00 pacific1 conmon[5132]:
> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.4/rpm/el8/BUILD/ceph-16.2.4/src/osd/OSDMap.cc: 658: FAILED ceph_assert(target_v >=
> 9)
> 2021-05-25T15:44:26.989163+02:00 pacific1 conmon[5132]:
> 2021-05-25T15:44:26.989239+02:00 pacific1 conmon[5132]:  ceph version
> 16.2.4 (3cbe25cde3cfa028984618ad32de9edc4c1eaed0) pacific (stable)
> 2021-05-25T15:44:26.989314+02:00 pacific1 conmon[5132]:  1:
> (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x158) [0x7ff3bf61a59c]
> 2021-05-25T15:44:26.989388+02:00 pacific1 conmon[5132]:  2:
> /usr/lib64/ceph/libceph-common.so.2(+0x2767b6) [0x7ff3bf61a7b6]
> 2021-05-25T15:44:26.989489+02:00 pacific1 conmon[5132]:  3:
> (OSDMap::Incremental::encode(ceph::buffer::v15_2_0::list&, unsigned
> long) const+0x539) [0x7ff3bfa529f9]
> 2021-05-25T15:44:26.989560+02:00 pacific1 conmon[5132]:  4:
> (OSDMonitor::reencode_incremental_map(ceph::buffer::v15_2_0::list&,
> unsigned long)+0x1c9) [0x55e377b36df9]
> 2021-05-25T15:44:26.989627+02:00 pacific1 conmon[5132]:  5:
> (OSDMonitor::get_version(unsigned long, unsigned long,
> ceph::buffer::v15_2_0::list&)+0x1f4) [0x55e377b37234]
> 2021-05-25T15:44:26.989693+02:00 pacific1 conmon[5132]:  6:
> (OSDMonitor::build_incremental(unsigned int, unsigned int, unsigned
> long)+0x301) [0x55e377b3a3c1]
> 2021-05-25T15:44:26.989759+02:00 pacific1 conmon[5132]:  7:
> (OSDMonitor::send_incremental(unsigned int, MonSession*, bool,
> boost::intrusive_ptr<MonOpRequest>)+0x104) [0x55e377b3b094]
> 2021-05-25T15:44:26.989825+02:00 pacific1 conmon[5132]:  8:
> (OSDMonitor::check_osdmap_sub(Subscription*)+0x72) [0x55e377b42792]
> 2021-05-25T15:44:26.989891+02:00 pacific1 conmon[5132]:  9:
> (Monitor::handle_subscribe(boost::intrusive_ptr<MonOpRequest>)+0xe82)
> [0x55e3779da402]
> 2021-05-25T15:44:26.989967+02:00 pacific1 conmon[5132]:  10:
> (Monitor::dispatch_op(boost::intrusive_ptr<MonOpRequest>)+0x78d)
> [0x55e377a002ed]
> 2021-05-25T15:44:26.990046+02:00 pacific1 conmon[5132]:  11:
> (Monitor::_ms_dispatch(Message*)+0x670) [0x55e377a01910]
> 2021-05-25T15:44:26.990113+02:00 pacific1 conmon[5132]:  12:
> (Dispatcher::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x5c)
> [0x55e377a2ffdc]
> 2021-05-25T15:44:26.990179+02:00 pacific1 conmon[5132]:  13:
> (DispatchQueue::entry()+0x126a) [0x7ff3bf854b1a]
> 2021-05-25T15:44:26.990255+02:00 pacific1 conmon[5132]:  14:
> (DispatchQueue::DispatchThread::entry()+0x11) [0x7ff3bf904b71]
> 2021-05-25T15:44:26.990330+02:00 pacific1 conmon[5132]:  15:
> /lib64/libpthread.so.0(+0x814a) [0x7ff3bd10a14a]
> 2021-05-25T15:44:26.990420+02:00 pacific1 conmon[5132]:  16: clone()
> 2021-05-25T15:44:26.990497+02:00 pacific1 conmon[5132]:
> 2021-05-25T15:44:26.990573+02:00 pacific1 conmon[5132]: debug      0>
> 2021-05-25T13:44:26.742+0000 7ff3b1aa1700 -1 *** Caught signal
> (Aborted) **
> 2021-05-25T15:44:26.990648+02:00 pacific1 conmon[5132]:  in thread
> 7ff3b1aa1700 thread_name:ms_dispatch
> 2021-05-25T15:44:26.990723+02:00 pacific1 conmon[5132]:
> 2021-05-25T15:44:26.990806+02:00 pacific1 conmon[5132]:  ceph version
> 16.2.4 (3cbe25cde3cfa028984618ad32de9edc4c1eaed0) pacific (stable)
> 2021-05-25T15:44:26.990883+02:00 pacific1 conmon[5132]:  1:
> /lib64/libpthread.so.0(+0x12b20) [0x7ff3bd114b20]
> 2021-05-25T15:44:26.990958+02:00 pacific1 conmon[5132]:  2: gsignal()
> 2021-05-25T15:44:26.991034+02:00 pacific1 conmon[5132]:  3: abort()
> 2021-05-25T15:44:26.991110+02:00 pacific1 conmon[5132]:  4:
> (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x1a9) [0x7ff3bf61a5ed]
> 2021-05-25T15:44:26.991176+02:00 pacific1 conmon[5132]:  5:
> /usr/lib64/ceph/libceph-common.so.2(+0x2767b6) [0x7ff3bf61a7b6]
> 2021-05-25T15:44:26.991251+02:00 pacific1 conmon[5132]:  6:
> (OSDMap::Incremental::encode(ceph::buffer::v15_2_0::list&, unsigned
> long) const+0x539) [0x7ff3bfa529f9]
> 2021-05-25T15:44:26.991326+02:00 pacific1 conmon[5132]:  7:
> (OSDMonitor::reencode_incremental_map(ceph::buffer::v15_2_0::list&,
> unsigned long)+0x1c9) [0x55e377b36df9]
> 2021-05-25T15:44:26.991393+02:00 pacific1 conmon[5132]:  8:
> (OSDMonitor::get_version(unsigned long, unsigned long,
> ceph::buffer::v15_2_0::list&)+0x1f4) [0x55e377b37234]
> 2021-05-25T15:44:26.991460+02:00 pacific1 conmon[5132]:  9:
> (OSDMonitor::build_incremental(unsigned int, unsigned int, unsigned
> long)+0x301) [0x55e377b3a3c1]
> 2021-05-25T15:44:26.991557+02:00 pacific1 conmon[5132]:  10:
> (OSDMonitor::send_incremental(unsigned int, MonSession*, bool,
> boost::intrusive_ptr<MonOpRequest>)+0x104) [0x55e377b3b094]
> 2021-05-25T15:44:26.991628+02:00 pacific1 conmon[5132]:  11:
> (OSDMonitor::check_osdmap_sub(Subscription*)+0x72) [0x55e377b42792]
> 2021-05-25T15:44:26.991695+02:00 pacific1 conmon[5132]:  12:
> (Monitor::handle_subscribe(boost::intrusive_ptr<MonOpRequest>)+0xe82)
> [0x55e3779da402]
> 2021-05-25T15:44:26.991761+02:00 pacific1 conmon[5132]:  13:
> (Monitor::dispatch_op(boost::intrusive_ptr<MonOpRequest>)+0x78d)
> [0x55e377a002ed]
> 2021-05-25T15:44:26.991827+02:00 pacific1 conmon[5132]:  14:
> (Monitor::_ms_dispatch(Message*)+0x670) [0x55e377a01910]
> 2021-05-25T15:44:26.991893+02:00 pacific1 conmon[5132]:  15:
> (Dispatcher::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x5c)
> [0x55e377a2ffdc]
> 2021-05-25T15:44:26.991959+02:00 pacific1 conmon[5132]:  16:
> (DispatchQueue::entry()+0x126a) [0x7ff3bf854b1a]
> 2021-05-25T15:44:26.992025+02:00 pacific1 conmon[5132]:  17:
> (DispatchQueue::DispatchThread::entry()+0x11) [0x7ff3bf904b71]
> 2021-05-25T15:44:26.992091+02:00 pacific1 conmon[5132]:  18:
> /lib64/libpthread.so.0(+0x814a) [0x7ff3bd10a14a]
> 2021-05-25T15:44:26.992156+02:00 pacific1 conmon[5132]:  19: clone()
> ---snip---
>
>
> I can't tell if this is due to the limited resources in my virtual
> cluster but I figured since the non-stretch mode seems to work as
> expected this could be a problem with the stretch mode. I can provide
> more information if required, just let me know what I can do.

This crash is an issue with overly-zealous safety checks breaking when
encoding the osdmap for kernel clients in stretch mode, and is fixed
for the next pacific point release[1]. Sorry for the trouble.
-Greg

[1]: https://github.com/ceph/ceph/pull/40484/commits/9453e34ea7c480ddfea8363cbad76cf1e2d46625
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux