Lots of OSDs with failed asserts

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

more and more OSDs now crash all the time and I've lost more OSDs than
my replication allows, all my data is currently down or inactive.

Can somebody help me fix those asserts and get them up again (so i can
start my distaster recovery backup)?

$ sudo /usr/bin/ceph-osd -f --cluster ceph --id 10 --setuser ceph
--setgroup ceph

2022-11-02T22:02:43.482+0100 ffffb8198040 -1 Falling back to public interface
2022-11-02T22:03:33.301+0100 ffffb8198040 -1 osd.10 30473 log_to_monitors true
2022-11-02T22:03:34.484+0100 ffffabdcbb00 -1 osd.10 30473
set_numa_affinity unable to identify public interface '' numa node:
(2) No such file or directory
/mnt/ceph/src/ceph-17.2.4/src/osd/OSD.cc: In function 'void
OSD::do_recovery(PG*, epoch_t, uint64_t, ThreadPool::TPHandle&)'
thread ffff9733bb00 time 2022-11-02T22:03:37.276509+0100
/mnt/ceph/src/ceph-17.2.4/src/osd/OSD.cc: 9676: FAILED
ceph_assert(started <= reserved_pushes)
 ceph version 17.2.4 (1353ed37dec8d74973edc3d5d5908c20ad5a7332) quincy (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x134) [0xaaaabcf5f74c]
 2: (ceph::__ceph_assertf_fail(char const*, char const*, int, char
const*, char const*, ...)+0) [0xaaaabcf5f8c8]
 3: (OSD::do_recovery(PG*, unsigned int, unsigned long,
ThreadPool::TPHandle&)+0x4f4) [0xaaaabcfee554]
 4: (ceph::osd::scheduler::PGRecovery::run(OSD*, OSDShard*,
boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x28)
[0xaaaabd276398]
 5: (OSD::ShardedOpWQ::_process(unsigned int,
ceph::heartbeat_handle_d*)+0x574) [0xaaaabcfeebb4]
 6: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x308)
[0xaaaabd6687e8]
 7: (ShardedThreadPool::WorkThreadSharded::entry()+0x18) [0xaaaabd66afe8]
 8: /usr/lib/libc.so.6(+0x80aec) [0xffffb6cc0aec]
 9: /usr/lib/libc.so.6(+0xea5dc) [0xffffb6d2a5dc]
2022-11-02T22:03:37.280+0100 ffff9733bb00 -1
/mnt/ceph/src/ceph-17.2.4/src/osd/OSD.cc: In function 'void
OSD::do_recovery(PG*, epoch_t, uint64_t, ThreadPool::TPHandle&)'
thread ffff9733bb00 time 2022-11-02T22:03:37.276509+0100
/mnt/ceph/src/ceph-17.2.4/src/osd/OSD.cc: 9676: FAILED
ceph_assert(started <= reserved_pushes)

 ceph version 17.2.4 (1353ed37dec8d74973edc3d5d5908c20ad5a7332) quincy (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x134) [0xaaaabcf5f74c]
 2: (ceph::__ceph_assertf_fail(char const*, char const*, int, char
const*, char const*, ...)+0) [0xaaaabcf5f8c8]
 3: (OSD::do_recovery(PG*, unsigned int, unsigned long,
ThreadPool::TPHandle&)+0x4f4) [0xaaaabcfee554]
 4: (ceph::osd::scheduler::PGRecovery::run(OSD*, OSDShard*,
boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x28)
[0xaaaabd276398]
 5: (OSD::ShardedOpWQ::_process(unsigned int,
ceph::heartbeat_handle_d*)+0x574) [0xaaaabcfeebb4]
 6: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x308)
[0xaaaabd6687e8]
 7: (ShardedThreadPool::WorkThreadSharded::entry()+0x18) [0xaaaabd66afe8]
 8: /usr/lib/libc.so.6(+0x80aec) [0xffffb6cc0aec]
 9: /usr/lib/libc.so.6(+0xea5dc) [0xffffb6d2a5dc]

*** Caught signal (Aborted) **
 in thread ffff9733bb00 thread_name:tp_osd_tp
 ceph version 17.2.4 (1353ed37dec8d74973edc3d5d5908c20ad5a7332) quincy (stable)
 1: __kernel_rt_sigreturn()
 2: /usr/lib/libc.so.6(+0x82790) [0xffffb6cc2790]
 3: raise()
 4: abort()
 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x188) [0xaaaabcf5f7a0]
 6: (ceph::__ceph_assertf_fail(char const*, char const*, int, char
const*, char const*, ...)+0) [0xaaaabcf5f8c8]
 7: (OSD::do_recovery(PG*, unsigned int, unsigned long,
ThreadPool::TPHandle&)+0x4f4) [0xaaaabcfee554]
 8: (ceph::osd::scheduler::PGRecovery::run(OSD*, OSDShard*,
boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x28)
[0xaaaabd276398]
 9: (OSD::ShardedOpWQ::_process(unsigned int,
ceph::heartbeat_handle_d*)+0x574) [0xaaaabcfeebb4]
 10: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x308)
[0xaaaabd6687e8]
 11: (ShardedThreadPool::WorkThreadSharded::entry()+0x18) [0xaaaabd66afe8]
 12: /usr/lib/libc.so.6(+0x80aec) [0xffffb6cc0aec]
 13: /usr/lib/libc.so.6(+0xea5dc) [0xffffb6d2a5dc]
2022-11-02T22:03:37.284+0100 ffff9733bb00 -1 *** Caught signal (Aborted) **
 in thread ffff9733bb00 thread_name:tp_osd_tp

 ceph version 17.2.4 (1353ed37dec8d74973edc3d5d5908c20ad5a7332) quincy (stable)
 1: __kernel_rt_sigreturn()
 2: /usr/lib/libc.so.6(+0x82790) [0xffffb6cc2790]
 3: raise()
 4: abort()
 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x188) [0xaaaabcf5f7a0]
 6: (ceph::__ceph_assertf_fail(char const*, char const*, int, char
const*, char const*, ...)+0) [0xaaaabcf5f8c8]
 7: (OSD::do_recovery(PG*, unsigned int, unsigned long,
ThreadPool::TPHandle&)+0x4f4) [0xaaaabcfee554]
 8: (ceph::osd::scheduler::PGRecovery::run(OSD*, OSDShard*,
boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x28)
[0xaaaabd276398]
 9: (OSD::ShardedOpWQ::_process(unsigned int,
ceph::heartbeat_handle_d*)+0x574) [0xaaaabcfeebb4]
 10: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x308)
[0xaaaabd6687e8]
 11: (ShardedThreadPool::WorkThreadSharded::entry()+0x18) [0xaaaabd66afe8]
 12: /usr/lib/libc.so.6(+0x80aec) [0xffffb6cc0aec]
 13: /usr/lib/libc.so.6(+0xea5dc) [0xffffb6d2a5dc]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.

 -9999> 2022-11-02T22:03:34.484+0100 ffffabdcbb00 -1 osd.10 30473
set_numa_affinity unable to identify public interface '' numa node:
(2) No such file or directory
 -9998> 2022-11-02T22:03:37.280+0100 ffff9733bb00 -1
/mnt/ceph/src/ceph-17.2.4/src/osd/OSD.cc: In function 'void
OSD::do_recovery(PG*, epoch_t, uint64_t, ThreadPool::TPHandle&)'
thread ffff9733bb00 time 2022-11-02T22:03:37.276509+0100
/mnt/ceph/src/ceph-17.2.4/src/osd/OSD.cc: 9676: FAILED
ceph_assert(started <= reserved_pushes)

 ceph version 17.2.4 (1353ed37dec8d74973edc3d5d5908c20ad5a7332) quincy (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x134) [0xaaaabcf5f74c]
 2: (ceph::__ceph_assertf_fail(char const*, char const*, int, char
const*, char const*, ...)+0) [0xaaaabcf5f8c8]
 3: (OSD::do_recovery(PG*, unsigned int, unsigned long,
ThreadPool::TPHandle&)+0x4f4) [0xaaaabcfee554]
 4: (ceph::osd::scheduler::PGRecovery::run(OSD*, OSDShard*,
boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x28)
[0xaaaabd276398]
 5: (OSD::ShardedOpWQ::_process(unsigned int,
ceph::heartbeat_handle_d*)+0x574) [0xaaaabcfeebb4]
 6: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x308)
[0xaaaabd6687e8]
 7: (ShardedThreadPool::WorkThreadSharded::entry()+0x18) [0xaaaabd66afe8]
 8: /usr/lib/libc.so.6(+0x80aec) [0xffffb6cc0aec]
 9: /usr/lib/libc.so.6(+0xea5dc) [0xffffb6d2a5dc]

 -9997> 2022-11-02T22:03:37.284+0100 ffff9733bb00 -1 *** Caught signal
(Aborted) **
 in thread ffff9733bb00 thread_name:tp_osd_tp

 ceph version 17.2.4 (1353ed37dec8d74973edc3d5d5908c20ad5a7332) quincy (stable)
 1: __kernel_rt_sigreturn()
 2: /usr/lib/libc.so.6(+0x82790) [0xffffb6cc2790]
 3: raise()
 4: abort()
 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x188) [0xaaaabcf5f7a0]
 6: (ceph::__ceph_assertf_fail(char const*, char const*, int, char
const*, char const*, ...)+0) [0xaaaabcf5f8c8]
 7: (OSD::do_recovery(PG*, unsigned int, unsigned long,
ThreadPool::TPHandle&)+0x4f4) [0xaaaabcfee554]
 8: (ceph::osd::scheduler::PGRecovery::run(OSD*, OSDShard*,
boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x28)
[0xaaaabd276398]
 9: (OSD::ShardedOpWQ::_process(unsigned int,
ceph::heartbeat_handle_d*)+0x574) [0xaaaabcfeebb4]
 10: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x308)
[0xaaaabd6687e8]
 11: (ShardedThreadPool::WorkThreadSharded::entry()+0x18) [0xaaaabd66afe8]
 12: /usr/lib/libc.so.6(+0x80aec) [0xffffb6cc0aec]
 13: /usr/lib/libc.so.6(+0xea5dc) [0xffffb6d2a5dc]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.

 -9999> 2022-11-02T22:03:34.484+0100 ffffabdcbb00 -1 osd.10 30473
set_numa_affinity unable to identify public interface '' numa node:
(2) No such file or directory
 -9998> 2022-11-02T22:03:37.280+0100 ffff9733bb00 -1
/mnt/ceph/src/ceph-17.2.4/src/osd/OSD.cc: In function 'void
OSD::do_recovery(PG*, epoch_t, uint64_t, ThreadPool::TPHandle&)'
thread ffff9733bb00 time 2022-11-02T22:03:37.276509+0100
/mnt/ceph/src/ceph-17.2.4/src/osd/OSD.cc: 9676: FAILED
ceph_assert(started <= reserved_pushes)

 ceph version 17.2.4 (1353ed37dec8d74973edc3d5d5908c20ad5a7332) quincy (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x134) [0xaaaabcf5f74c]
 2: (ceph::__ceph_assertf_fail(char const*, char const*, int, char
const*, char const*, ...)+0) [0xaaaabcf5f8c8]
 3: (OSD::do_recovery(PG*, unsigned int, unsigned long,
ThreadPool::TPHandle&)+0x4f4) [0xaaaabcfee554]
 4: (ceph::osd::scheduler::PGRecovery::run(OSD*, OSDShard*,
boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x28)
[0xaaaabd276398]
 5: (OSD::ShardedOpWQ::_process(unsigned int,
ceph::heartbeat_handle_d*)+0x574) [0xaaaabcfeebb4]
 6: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x308)
[0xaaaabd6687e8]
 7: (ShardedThreadPool::WorkThreadSharded::entry()+0x18) [0xaaaabd66afe8]
 8: /usr/lib/libc.so.6(+0x80aec) [0xffffb6cc0aec]
 9: /usr/lib/libc.so.6(+0xea5dc) [0xffffb6d2a5dc]

 -9997> 2022-11-02T22:03:37.284+0100 ffff9733bb00 -1 *** Caught signal
(Aborted) **
 in thread ffff9733bb00 thread_name:tp_osd_tp

 ceph version 17.2.4 (1353ed37dec8d74973edc3d5d5908c20ad5a7332) quincy (stable)
 1: __kernel_rt_sigreturn()
 2: /usr/lib/libc.so.6(+0x82790) [0xffffb6cc2790]
 3: raise()
 4: abort()
 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x188) [0xaaaabcf5f7a0]
 6: (ceph::__ceph_assertf_fail(char const*, char const*, int, char
const*, char const*, ...)+0) [0xaaaabcf5f8c8]
 7: (OSD::do_recovery(PG*, unsigned int, unsigned long,
ThreadPool::TPHandle&)+0x4f4) [0xaaaabcfee554]
 8: (ceph::osd::scheduler::PGRecovery::run(OSD*, OSDShard*,
boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x28)
[0xaaaabd276398]
 9: (OSD::ShardedOpWQ::_process(unsigned int,
ceph::heartbeat_handle_d*)+0x574) [0xaaaabcfeebb4]
 10: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x308)
[0xaaaabd6687e8]
 11: (ShardedThreadPool::WorkThreadSharded::entry()+0x18) [0xaaaabd66afe8]
 12: /usr/lib/libc.so.6(+0x80aec) [0xffffb6cc0aec]
 13: /usr/lib/libc.so.6(+0xea5dc) [0xffffb6d2a5dc]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux