OSDs crash after deleting unfound object in Nautilus 14.2.22

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Folks,

We had to delete some unfound objects in our cache to get our cluster
back working! but after an hour we see OSD's crash

we found that it is caused by the fact that we deleted the:
 "hit_set_8.3fc_archive_2021-09-09 08:25:58.520768Z_2021-09-09
08:26:18.907234Z" Object

Crash-Log can be found here https://paste.openstack.org/show/809211/

our plan is now to change the osd code to not update the stats in
order to get the osd back online and remove the cache layer

diff --git a/src/osd/PrimaryLogPG.cc b/src/osd/PrimaryLogPG.cc
index 3b3e3e59292..a06fec9c269 100644
--- a/src/osd/PrimaryLogPG.cc
+++ b/src/osd/PrimaryLogPG.cc
@@ -13932,11 +13932,13 @@ void
PrimaryLogPG::hit_set_trim(OpContextUPtr &ctx, unsigned max)
     updated_hit_set_hist.history.pop_front();

     ObjectContextRef obc = get_object_context(oid, false);
-    ceph_assert(obc);
+    //ceph_assert(obc);
+    if (obc) {
     --ctx->delta_stats.num_objects;
     --ctx->delta_stats.num_objects_hit_set_archive;
     ctx->delta_stats.num_bytes -= obc->obs.oi.size;
     ctx->delta_stats.num_bytes_hit_set_archive -= obc->obs.oi.size;
+    }
   }
 }

Does anyone have done this before or have another workaround to get
the OSD back online

Thanks in Advance
Ansgar
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux