Gregory, I have raised a ticket already. https://tracker.ceph.com/issues/52445 Amudhan On Tue, Aug 31, 2021 at 12:00 AM Gregory Farnum <gfarnum@xxxxxxxxxx> wrote: > Hmm, this ceph_assert hasn't shown up in my email before. It looks > like there may be a soft-state bug in Octopus. Can you file a ticket > at tracker.ceph.com with the bcaktrace and osd log file? We can direct > that to the RADOS team to check out. > -Greg > > On Sat, Aug 28, 2021 at 7:13 AM Amudhan P <amudhan83@xxxxxxxxx> wrote: > > > > Hi, > > > > I am having a peculiar problem with my ceph octopus cluster. 2 weeks ago > I > > had an issue that started like too many scrub error and later random OSD > > stopped which lead to pg corrupt, replica missing. since it's a testing > > cluster I wanted to understand the issue. > > I tried to recover PG but it didn't help. when I use `set norecover, > > norebalance, nodown` OSD service running without stopping. > > > > I have gone thru the steps in ceph osd troubleshooting but nothing helps > or > > leads to finding the issue. > > > > I have mailed earlier but couldn't get any solution. > > > > Any help would be appreciated to find out the issue. > > > > *error msg in one of the OSD which failed.* > > > /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.7/rpm/el8/BUILD/ > > ceph-15.2.7/src/osd/OSD.cc: 9521: FAILED ceph_assert(started <= > > reserved_pushes) > > > > ceph version 15.2.7 (88e41c6c49beb18add4fdb6b4326ca466d931db8) octopus > > (stable) > > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > > const*)+0x158) [0x55fcb6621dbe] > > 2: (()+0x504fd8) [0x55fcb6621fd8] > > 3: (OSD::do_recovery(PG*, unsigned int, unsigned long, > > ThreadPool::TPHandle&)+0x5f5) [0x55fcb6704c25] > > 4: (ceph::osd::scheduler::PGRecovery::run(OSD*, OSDShard*, > > boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x1d) [0x55fcb6960a3d] > > 5: (OSD::ShardedOpWQ::_process(unsigned int, > > ceph::heartbeat_handle_d*)+0x12ef) [0x55fcb67224df] > > 6: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5c4) > > [0x55fcb6d5b224] > > 7: (ShardedThreadPool::WorkThreadSharded::entry()+0x14) [0x55fcb6d5de84] > > 8: (()+0x82de) [0x7f04c1b1c2de] > > 9: (clone()+0x43) [0x7f04c0853e83] > > > > 0> 2021-08-28T13:53:37.444+0000 7f04a128d700 -1 *** Caught signal > > (Aborted) ** > > in thread 7f04a128d700 thread_name:tp_osd_tp > > > > ceph version 15.2.7 (88e41c6c49beb18add4fdb6b4326ca466d931db8) octopus > > (stable) > > 1: (()+0x12dd0) [0x7f04c1b26dd0] > > 2: (gsignal()+0x10f) [0x7f04c078f70f] > > 3: (abort()+0x127) [0x7f04c0779b25] > > 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char > > const*)+0x1a9) [0x55fcb6621e0f] > > 5: (()+0x504fd8) [0x55fcb6621fd8] > > 6: (OSD::do_recovery(PG*, unsigned int, unsigned long, > > ThreadPool::TPHandle&)+0x5f5) [0x55fcb6704c25] > > 7: (ceph::osd::scheduler::PGRecovery::run(OSD*, OSDShard*, > > boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x1d) [0x55fcb6960a3d] > > 8: (OSD::ShardedOpWQ::_process(unsigned int, > > ceph::heartbeat_handle_d*)+0x12ef) [0x55fcb67224df] > > 9: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5c4) > > [0x55fcb6d5b224] > > 10: (ShardedThreadPool::WorkThreadSharded::entry()+0x14) > [0x55fcb6d5de84] > > 11: (()+0x82de) [0x7f04c1b1c2de] > > 12: (clone()+0x43) [0x7f04c0853e83] > > NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed > > to interpret this. > > > > > > Thanks > > Amudhan > > _______________________________________________ > > ceph-users mailing list -- ceph-users@xxxxxxx > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > > > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx