ceph-14.2.22 OSD crashing - PrimaryLogPG::hit_set_trim on unfound object

Nikola Ciprich <nikola.ciprich@xxxxxxxxxxx> · Sun, 29 Dec 2024 20:15:48 +0100

Hello dear ceph users and developers,

today, we've hit issue on one of our legacy clusters running 14.2.22.

we've apparently lost two objects containing cache tier hit_set history.

when hit_set_trim hits missing object, it causes OSD to crash:

 ceph version 14.2.22 (ca74598065096e6fcbd8433c8779a2be0c889351) nautilus (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x141) [0x55714332faad]
 2: (()+0x4e8ca8) [0x55714332fca8]
 3: (PrimaryLogPG::hit_set_trim(std::unique_ptr<PrimaryLogPG::OpContext, std::default_delete<PrimaryLogPG::OpContext> >&, unsigned int)+0xcd6) [0x557143626136]
 4: (PrimaryLogPG::hit_set_remove_all()+0x2d5) [0x5571436264b5]
 5: (PrimaryLogPG::on_activate()+0x55a) [0x5571436360aa]
 6: (PG::RecoveryState::Active::react(PG::AllReplicasActivated const&)+0x130) [0x557143512b50]
 7: (boost::statechart::simple_state<PG::RecoveryState::Active, PG::RecoveryState::Primary, PG::RecoveryState::Activating, (boost::statechart::history_mode)0>::react_impl(boost::statechart::
 8: (boost::statechart::simple_state<PG::RecoveryState::Activating, PG::RecoveryState::Active, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na
 9: (PG::do_peering_event(std::shared_ptr<PGPeeringEvent>, PG::RecoveryCtx*)+0x15d) [0x55714353c7ed]
 10: (OSD::dequeue_peering_evt(OSDShard*, PG*, std::shared_ptr<PGPeeringEvent>, ThreadPool::TPHandle&)+0x211) [0x5571434555d1]
 11: (PGPeeringItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x52) [0x557143723352]
 12: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x582) [0x557143443122]
 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x3eb) [0x557143ac2f4b]
 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x557143ac5d30]
 15: (()+0x7ea5) [0x7f22383e5ea5]
 16: (clone()+0x6d) [0x7f22372a8b0d]


I've found old report of similar problem here: https://www.spinics.net/lists/ceph-users/msg48660.html

however using the same patch:

diff -Naur ceph-14.2.22/src/osd/PrimaryLogPG.cc ceph-14.2.22-fix-hit_set_trim/src/osd/PrimaryLogPG.cc

--- ceph-14.2.22/src/osd/PrimaryLogPG.cc        2024-12-29 12:34:17.000000000 +0100
+++ ceph-14.2.22-fix-hit_set_trim/src/osd/PrimaryLogPG.cc       2024-12-29 19:42:31.527632776 +0100
@@ -13932,11 +13932,13 @@
     updated_hit_set_hist.history.pop_front();

     ObjectContextRef obc = get_object_context(oid, false);
-    ceph_assert(obc);
-    --ctx->delta_stats.num_objects;
-    --ctx->delta_stats.num_objects_hit_set_archive;
-    ctx->delta_stats.num_bytes -= obc->obs.oi.size;
-    ctx->delta_stats.num_bytes_hit_set_archive -= obc->obs.oi.size;
+    //ceph_assert(obc);
+    if(obc){
+           --ctx->delta_stats.num_objects;
+           --ctx->delta_stats.num_objects_hit_set_archive;
+           ctx->delta_stats.num_bytes -= obc->obs.oi.size;
+           ctx->delta_stats.num_bytes_hit_set_archive -= obc->obs.oi.size;
+    };
   }
 }


causes OSD to crash later:

/usr/src/redhat/BUILD/ceph-14.2.22/src/osd/PrimaryLogPG.cc: In function 'virtual void PrimaryLogPG::op_applied(const eversion_t&)' thread 7f91e146d700 time 2024-12-29 19:58:13.880029
/usr/src/redhat/BUILD/ceph-14.2.22/src/osd/PrimaryLogPG.cc: 10457: FAILED ceph_assert(applied_version <= info.last_update)
/usr/src/redhat/BUILD/ceph-14.2.22/src/osd/PrimaryLogPG.cc: In function 'virtual void PrimaryLogPG::op_applied(const eversion_t&)' thread 7f91df469700 time 2024-12-29 19:58:13.882629
/usr/src/redhat/BUILD/ceph-14.2.22/src/osd/PrimaryLogPG.cc: 10457: FAILED ceph_assert(applied_version <= info.last_update)
 ceph version 14.2.22 (ca74598065096e6fcbd8433c8779a2be0c889351) nautilus (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x122) [0x55e07e95b815]
 2: (()+0x49899a) [0x55e07e95b99a]
 3: (PrimaryLogPG::op_applied(eversion_t const&)+0x1f2) [0x55e07eb9fef2]
 4: (ReplicatedBackend::submit_transaction(hobject_t const&, object_stat_sum_t const&, eversion_t const&, std::unique_ptr<PGTransaction, std::default_delete<PGTransaction> >&&, eversion_t const&, eversion_t const&, std::vector<pg_log_entry_t, std::allocator<pg_log_entry_t> > const&, boost::optional<pg_hit_set_history_t>&, Context*, unsigned long, osd_reqid_t, boost::intrusive_ptr<OpRequest>)+0x720) [0x55e07ed411b0]
 5: (PrimaryLogPG::issue_repop(PrimaryLogPG::RepGather*, PrimaryLogPG::OpContext*)+0xdc5) [0x55e07eba0cd5]
 6: (PrimaryLogPG::simple_opc_submit(std::unique_ptr<PrimaryLogPG::OpContext, std::default_delete<PrimaryLogPG::OpContext> >)+0x9b) [0x55e07eba2acb]
 7: (PrimaryLogPG::hit_set_remove_all()+0x2e5) [0x55e07ebdcbf5]
 8: (PrimaryLogPG::on_pool_change()+0xeb) [0x55e07ebddc7b]
 9: (PG::handle_advance_map(std::shared_ptr<OSDMap const>, std::shared_ptr<OSDMap const>, std::vector<int, std::allocator<int> >&, int, std::vector<int, std::allocator<int> >&, int, PG::RecoveryCtx*)+0x34f) [0x55e07eaf8c4f]
 10: (OSD::advance_pg(unsigned int, PG*, ThreadPool::TPHandle&, PG::RecoveryCtx*)+0x2f6) [0x55e07ea482e6]
 11: (OSD::dequeue_peering_evt(OSDShard*, PG*, std::shared_ptr<PGPeeringEvent>, ThreadPool::TPHandle&)+0x1b4) [0x55e07ea50a24]
 12: (PGPeeringItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x4e) [0x55e07ecde60e]
 13: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0xfea) [0x55e07ea5527a]
 14: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x415) [0x55e07f032345]
 15: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55e07f034740]
 16: (()+0x7ea5) [0x7f9203024ea5]
 17: (clone()+0x6d) [0x7f9201ee7b0d]


unfortunately here I don't have any clue how to proceed further.

I've increased hit_set_count to 32 and hit_set_period to 36000 thus
hopefully gaining some time (now OSD seem to be running).

Any ideas on how to safely get from this mess? I can't easily get rid of cache tier now
since it's used by running VMs and what is worse, I'm not sure I won't hit same problem
when deleting cache pool anyways - OSDs are shared by cache pool and NVME data pool so it's pretty
uncomfortable situation :-(

I'm using bluestore for all OSDs, so can't even try copying other hit_set within filestore..

I'll be very grateful for any help

with best regards

nikola ciprich




_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx