Re: ceph-14.2.22 OSD crashing - PrimaryLogPG::hit_set_trim on unfound object

Eugen Block <eblock@xxxxxx> · Wed, 29 Jan 2025 08:44:30 +0000

Hi,

did you make any progress with this? I can't really help with the  
stack trace, I'm happy that we could successfully decommission our  
cache tier last week (although it served us very well for almost nine  
years or so).
You write that those cache tier OSDs are used for both cache tier and  
data pools. Maybe you can split that by moving the cache tier to  
different OSDs so there's no mixed use? I'm not very optimistic that  
this would mitigate the issue, we had such a setup for months during  
the transition to eventually remove the cache tier. But we had to  
switch off all VMs in order to safely get rid of the cache tier  
because it wouldn't let us flush the remaining header objects. But now  
we're finally in a state where we can plan the next upgrade.

Regards,
Eugen

Zitat von Nikola Ciprich <nikola.ciprich@xxxxxxxxxxx>:

Hello dear ceph users and developers,

today, we've hit issue on one of our legacy clusters running 14.2.22.

we've apparently lost two objects containing cache tier hit_set history.

when hit_set_trim hits missing object, it causes OSD to crash:

 ceph version 14.2.22 (ca74598065096e6fcbd8433c8779a2be0c889351)  
nautilus (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char  
const*)+0x141) [0x55714332faad]
 2: (()+0x4e8ca8) [0x55714332fca8]
 3:  
(PrimaryLogPG::hit_set_trim(std::unique_ptr<PrimaryLogPG::OpContext,  
std::default_delete<PrimaryLogPG::OpContext> >&, unsigned  
int)+0xcd6) [0x557143626136]
 4: (PrimaryLogPG::hit_set_remove_all()+0x2d5) [0x5571436264b5]
 5: (PrimaryLogPG::on_activate()+0x55a) [0x5571436360aa]
 6: (PG::RecoveryState::Active::react(PG::AllReplicasActivated  
const&)+0x130) [0x557143512b50]
 7: (boost::statechart::simple_state<PG::RecoveryState::Active,  
PG::RecoveryState::Primary, PG::RecoveryState::Activating,  
(boost::statechart::history_mode)0>::react_impl(boost::statechart::
 8: (boost::statechart::simple_state<PG::RecoveryState::Activating,  
PG::RecoveryState::Active, boost::mpl::list<mpl_::na, mpl_::na,  
mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na
 9: (PG::do_peering_event(std::shared_ptr<PGPeeringEvent>,  
PG::RecoveryCtx*)+0x15d) [0x55714353c7ed]
 10: (OSD::dequeue_peering_evt(OSDShard*, PG*,  
std::shared_ptr<PGPeeringEvent>, ThreadPool::TPHandle&)+0x211)  
[0x5571434555d1]
 11: (PGPeeringItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&,  
ThreadPool::TPHandle&)+0x52) [0x557143723352]
 12: (OSD::ShardedOpWQ::_process(unsigned int,  
ceph::heartbeat_handle_d*)+0x582) [0x557143443122]
 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned  
int)+0x3eb) [0x557143ac2f4b]
 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x557143ac5d30]
 15: (()+0x7ea5) [0x7f22383e5ea5]
 16: (clone()+0x6d) [0x7f22372a8b0d]


I've found old report of similar problem here:  
https://www.spinics.net/lists/ceph-users/msg48660.html

however using the same patch:

diff -Naur ceph-14.2.22/src/osd/PrimaryLogPG.cc  
ceph-14.2.22-fix-hit_set_trim/src/osd/PrimaryLogPG.cc

--- ceph-14.2.22/src/osd/PrimaryLogPG.cc        2024-12-29  
12:34:17.000000000 +0100
+++ ceph-14.2.22-fix-hit_set_trim/src/osd/PrimaryLogPG.cc        
2024-12-29 19:42:31.527632776 +0100
@@ -13932,11 +13932,13 @@
     updated_hit_set_hist.history.pop_front();

     ObjectContextRef obc = get_object_context(oid, false);
-    ceph_assert(obc);
-    --ctx->delta_stats.num_objects;
-    --ctx->delta_stats.num_objects_hit_set_archive;
-    ctx->delta_stats.num_bytes -= obc->obs.oi.size;
-    ctx->delta_stats.num_bytes_hit_set_archive -= obc->obs.oi.size;
+    //ceph_assert(obc);
+    if(obc){
+           --ctx->delta_stats.num_objects;
+           --ctx->delta_stats.num_objects_hit_set_archive;
+           ctx->delta_stats.num_bytes -= obc->obs.oi.size;
+           ctx->delta_stats.num_bytes_hit_set_archive -= obc->obs.oi.size;
+    };
   }
 }


causes OSD to crash later:

/usr/src/redhat/BUILD/ceph-14.2.22/src/osd/PrimaryLogPG.cc: In  
function 'virtual void PrimaryLogPG::op_applied(const eversion_t&)'  
thread 7f91e146d700 time 2024-12-29 19:58:13.880029
/usr/src/redhat/BUILD/ceph-14.2.22/src/osd/PrimaryLogPG.cc: 10457:  
FAILED ceph_assert(applied_version <= info.last_update)
/usr/src/redhat/BUILD/ceph-14.2.22/src/osd/PrimaryLogPG.cc: In  
function 'virtual void PrimaryLogPG::op_applied(const eversion_t&)'  
thread 7f91df469700 time 2024-12-29 19:58:13.882629
/usr/src/redhat/BUILD/ceph-14.2.22/src/osd/PrimaryLogPG.cc: 10457:  
FAILED ceph_assert(applied_version <= info.last_update)
 ceph version 14.2.22 (ca74598065096e6fcbd8433c8779a2be0c889351)  
nautilus (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char  
const*)+0x122) [0x55e07e95b815]
 2: (()+0x49899a) [0x55e07e95b99a]
 3: (PrimaryLogPG::op_applied(eversion_t const&)+0x1f2) [0x55e07eb9fef2]
 4: (ReplicatedBackend::submit_transaction(hobject_t const&,  
object_stat_sum_t const&, eversion_t const&,  
std::unique_ptr<PGTransaction, std::default_delete<PGTransaction>  
>&&, eversion_t const&, eversion_t const&,  
std::vector<pg_log_entry_t, std::allocator<pg_log_entry_t> > const&,  
boost::optional<pg_hit_set_history_t>&, Context*, unsigned long,  
osd_reqid_t, boost::intrusive_ptr<OpRequest>)+0x720) [0x55e07ed411b0]
 5: (PrimaryLogPG::issue_repop(PrimaryLogPG::RepGather*,  
PrimaryLogPG::OpContext*)+0xdc5) [0x55e07eba0cd5]
 6:  
(PrimaryLogPG::simple_opc_submit(std::unique_ptr<PrimaryLogPG::OpContext,  
std::default_delete<PrimaryLogPG::OpContext> >)+0x9b) [0x55e07eba2acb]
 7: (PrimaryLogPG::hit_set_remove_all()+0x2e5) [0x55e07ebdcbf5]
 8: (PrimaryLogPG::on_pool_change()+0xeb) [0x55e07ebddc7b]
 9: (PG::handle_advance_map(std::shared_ptr<OSDMap const>,  
std::shared_ptr<OSDMap const>, std::vector<int, std::allocator<int>  
>&, int, std::vector<int, std::allocator<int> >&, int,  
PG::RecoveryCtx*)+0x34f) [0x55e07eaf8c4f]
 10: (OSD::advance_pg(unsigned int, PG*, ThreadPool::TPHandle&,  
PG::RecoveryCtx*)+0x2f6) [0x55e07ea482e6]
 11: (OSD::dequeue_peering_evt(OSDShard*, PG*,  
std::shared_ptr<PGPeeringEvent>, ThreadPool::TPHandle&)+0x1b4)  
[0x55e07ea50a24]
 12: (PGPeeringItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&,  
ThreadPool::TPHandle&)+0x4e) [0x55e07ecde60e]
 13: (OSD::ShardedOpWQ::_process(unsigned int,  
ceph::heartbeat_handle_d*)+0xfea) [0x55e07ea5527a]
 14: (ShardedThreadPool::shardedthreadpool_worker(unsigned  
int)+0x415) [0x55e07f032345]
 15: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55e07f034740]
 16: (()+0x7ea5) [0x7f9203024ea5]
 17: (clone()+0x6d) [0x7f9201ee7b0d]


unfortunately here I don't have any clue how to proceed further.

I've increased hit_set_count to 32 and hit_set_period to 36000 thus
hopefully gaining some time (now OSD seem to be running).

Any ideas on how to safely get from this mess? I can't easily get  
rid of cache tier now
since it's used by running VMs and what is worse, I'm not sure I  
won't hit same problem
when deleting cache pool anyways - OSDs are shared by cache pool and  
NVME data pool so it's pretty
uncomfortable situation :-(

I'm using bluestore for all OSDs, so can't even try copying other  
hit_set within filestore..

I'll be very grateful for any help

with best regards

nikola ciprich




_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx


_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx