Hi Folks, We had to delete some unfound objects in our cache to get our cluster back working! but after an hour we see OSD's crash we found that it is caused by the fact that we deleted the: "hit_set_8.3fc_archive_2021-09-09 08:25:58.520768Z_2021-09-09 08:26:18.907234Z" Object Crash-Log can be found here https://paste.openstack.org/show/809211/ our plan is now to change the osd code to not update the stats in order to get the osd back online and remove the cache layer diff --git a/src/osd/PrimaryLogPG.cc b/src/osd/PrimaryLogPG.cc index 3b3e3e59292..a06fec9c269 100644 --- a/src/osd/PrimaryLogPG.cc +++ b/src/osd/PrimaryLogPG.cc @@ -13932,11 +13932,13 @@ void PrimaryLogPG::hit_set_trim(OpContextUPtr &ctx, unsigned max) updated_hit_set_hist.history.pop_front(); ObjectContextRef obc = get_object_context(oid, false); - ceph_assert(obc); + //ceph_assert(obc); + if (obc) { --ctx->delta_stats.num_objects; --ctx->delta_stats.num_objects_hit_set_archive; ctx->delta_stats.num_bytes -= obc->obs.oi.size; ctx->delta_stats.num_bytes_hit_set_archive -= obc->obs.oi.size; + } } } Does anyone have done this before or have another workaround to get the OSD back online Thanks in Advance Ansgar _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx