Thanks for the confirmation, Greg! I‘ll try with a newer release then.
That’s why we’re testing, isn’t it? ;-)
Then the OPs issue is probably not resolved yet since he didn’t
mention a stretch cluster. Sorry for high-jacking the thread.
Zitat von Gregory Farnum <gfarnum@xxxxxxxxxx>:
On Tue, May 25, 2021 at 7:17 AM Eugen Block <eblock@xxxxxx> wrote:
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.4/rpm/el8/BUILD/ceph-16.2.4/src/osd/OSDMap.cc: In function 'void OSDMap::Incremental::encode(ceph::buffer::v15_2_0::list&, uint64_t) const' thread 7ff3b1aa1700
time
2021-05-25T13:44:26.732857+0000
2021-05-25T15:44:26.989087+02:00 pacific1 conmon[5132]:
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.4/rpm/el8/BUILD/ceph-16.2.4/src/osd/OSDMap.cc: 658: FAILED ceph_assert(target_v
>=
9)
2021-05-25T15:44:26.989163+02:00 pacific1 conmon[5132]:
2021-05-25T15:44:26.989239+02:00 pacific1 conmon[5132]: ceph version
16.2.4 (3cbe25cde3cfa028984618ad32de9edc4c1eaed0) pacific (stable)
2021-05-25T15:44:26.989314+02:00 pacific1 conmon[5132]: 1:
(ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x158) [0x7ff3bf61a59c]
2021-05-25T15:44:26.989388+02:00 pacific1 conmon[5132]: 2:
/usr/lib64/ceph/libceph-common.so.2(+0x2767b6) [0x7ff3bf61a7b6]
2021-05-25T15:44:26.989489+02:00 pacific1 conmon[5132]: 3:
(OSDMap::Incremental::encode(ceph::buffer::v15_2_0::list&, unsigned
long) const+0x539) [0x7ff3bfa529f9]
2021-05-25T15:44:26.989560+02:00 pacific1 conmon[5132]: 4:
(OSDMonitor::reencode_incremental_map(ceph::buffer::v15_2_0::list&,
unsigned long)+0x1c9) [0x55e377b36df9]
2021-05-25T15:44:26.989627+02:00 pacific1 conmon[5132]: 5:
(OSDMonitor::get_version(unsigned long, unsigned long,
ceph::buffer::v15_2_0::list&)+0x1f4) [0x55e377b37234]
2021-05-25T15:44:26.989693+02:00 pacific1 conmon[5132]: 6:
(OSDMonitor::build_incremental(unsigned int, unsigned int, unsigned
long)+0x301) [0x55e377b3a3c1]
2021-05-25T15:44:26.989759+02:00 pacific1 conmon[5132]: 7:
(OSDMonitor::send_incremental(unsigned int, MonSession*, bool,
boost::intrusive_ptr<MonOpRequest>)+0x104) [0x55e377b3b094]
2021-05-25T15:44:26.989825+02:00 pacific1 conmon[5132]: 8:
(OSDMonitor::check_osdmap_sub(Subscription*)+0x72) [0x55e377b42792]
2021-05-25T15:44:26.989891+02:00 pacific1 conmon[5132]: 9:
(Monitor::handle_subscribe(boost::intrusive_ptr<MonOpRequest>)+0xe82)
[0x55e3779da402]
2021-05-25T15:44:26.989967+02:00 pacific1 conmon[5132]: 10:
(Monitor::dispatch_op(boost::intrusive_ptr<MonOpRequest>)+0x78d)
[0x55e377a002ed]
2021-05-25T15:44:26.990046+02:00 pacific1 conmon[5132]: 11:
(Monitor::_ms_dispatch(Message*)+0x670) [0x55e377a01910]
2021-05-25T15:44:26.990113+02:00 pacific1 conmon[5132]: 12:
(Dispatcher::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x5c)
[0x55e377a2ffdc]
2021-05-25T15:44:26.990179+02:00 pacific1 conmon[5132]: 13:
(DispatchQueue::entry()+0x126a) [0x7ff3bf854b1a]
2021-05-25T15:44:26.990255+02:00 pacific1 conmon[5132]: 14:
(DispatchQueue::DispatchThread::entry()+0x11) [0x7ff3bf904b71]
2021-05-25T15:44:26.990330+02:00 pacific1 conmon[5132]: 15:
/lib64/libpthread.so.0(+0x814a) [0x7ff3bd10a14a]
2021-05-25T15:44:26.990420+02:00 pacific1 conmon[5132]: 16: clone()
2021-05-25T15:44:26.990497+02:00 pacific1 conmon[5132]:
2021-05-25T15:44:26.990573+02:00 pacific1 conmon[5132]: debug 0>
2021-05-25T13:44:26.742+0000 7ff3b1aa1700 -1 *** Caught signal
(Aborted) **
2021-05-25T15:44:26.990648+02:00 pacific1 conmon[5132]: in thread
7ff3b1aa1700 thread_name:ms_dispatch
2021-05-25T15:44:26.990723+02:00 pacific1 conmon[5132]:
2021-05-25T15:44:26.990806+02:00 pacific1 conmon[5132]: ceph version
16.2.4 (3cbe25cde3cfa028984618ad32de9edc4c1eaed0) pacific (stable)
2021-05-25T15:44:26.990883+02:00 pacific1 conmon[5132]: 1:
/lib64/libpthread.so.0(+0x12b20) [0x7ff3bd114b20]
2021-05-25T15:44:26.990958+02:00 pacific1 conmon[5132]: 2: gsignal()
2021-05-25T15:44:26.991034+02:00 pacific1 conmon[5132]: 3: abort()
2021-05-25T15:44:26.991110+02:00 pacific1 conmon[5132]: 4:
(ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x1a9) [0x7ff3bf61a5ed]
2021-05-25T15:44:26.991176+02:00 pacific1 conmon[5132]: 5:
/usr/lib64/ceph/libceph-common.so.2(+0x2767b6) [0x7ff3bf61a7b6]
2021-05-25T15:44:26.991251+02:00 pacific1 conmon[5132]: 6:
(OSDMap::Incremental::encode(ceph::buffer::v15_2_0::list&, unsigned
long) const+0x539) [0x7ff3bfa529f9]
2021-05-25T15:44:26.991326+02:00 pacific1 conmon[5132]: 7:
(OSDMonitor::reencode_incremental_map(ceph::buffer::v15_2_0::list&,
unsigned long)+0x1c9) [0x55e377b36df9]
2021-05-25T15:44:26.991393+02:00 pacific1 conmon[5132]: 8:
(OSDMonitor::get_version(unsigned long, unsigned long,
ceph::buffer::v15_2_0::list&)+0x1f4) [0x55e377b37234]
2021-05-25T15:44:26.991460+02:00 pacific1 conmon[5132]: 9:
(OSDMonitor::build_incremental(unsigned int, unsigned int, unsigned
long)+0x301) [0x55e377b3a3c1]
2021-05-25T15:44:26.991557+02:00 pacific1 conmon[5132]: 10:
(OSDMonitor::send_incremental(unsigned int, MonSession*, bool,
boost::intrusive_ptr<MonOpRequest>)+0x104) [0x55e377b3b094]
2021-05-25T15:44:26.991628+02:00 pacific1 conmon[5132]: 11:
(OSDMonitor::check_osdmap_sub(Subscription*)+0x72) [0x55e377b42792]
2021-05-25T15:44:26.991695+02:00 pacific1 conmon[5132]: 12:
(Monitor::handle_subscribe(boost::intrusive_ptr<MonOpRequest>)+0xe82)
[0x55e3779da402]
2021-05-25T15:44:26.991761+02:00 pacific1 conmon[5132]: 13:
(Monitor::dispatch_op(boost::intrusive_ptr<MonOpRequest>)+0x78d)
[0x55e377a002ed]
2021-05-25T15:44:26.991827+02:00 pacific1 conmon[5132]: 14:
(Monitor::_ms_dispatch(Message*)+0x670) [0x55e377a01910]
2021-05-25T15:44:26.991893+02:00 pacific1 conmon[5132]: 15:
(Dispatcher::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x5c)
[0x55e377a2ffdc]
2021-05-25T15:44:26.991959+02:00 pacific1 conmon[5132]: 16:
(DispatchQueue::entry()+0x126a) [0x7ff3bf854b1a]
2021-05-25T15:44:26.992025+02:00 pacific1 conmon[5132]: 17:
(DispatchQueue::DispatchThread::entry()+0x11) [0x7ff3bf904b71]
2021-05-25T15:44:26.992091+02:00 pacific1 conmon[5132]: 18:
/lib64/libpthread.so.0(+0x814a) [0x7ff3bd10a14a]
2021-05-25T15:44:26.992156+02:00 pacific1 conmon[5132]: 19: clone()
---snip---
I can't tell if this is due to the limited resources in my virtual
cluster but I figured since the non-stretch mode seems to work as
expected this could be a problem with the stretch mode. I can provide
more information if required, just let me know what I can do.
This crash is an issue with overly-zealous safety checks breaking when
encoding the osdmap for kernel clients in stretch mode, and is fixed
for the next pacific point release[1]. Sorry for the trouble.
-Greg
[1]:
https://github.com/ceph/ceph/pull/40484/commits/9453e34ea7c480ddfea8363cbad76cf1e2d46625
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx