Re: Multiple OSD crashing a lot

Hein-Pieter van Braam <hp@xxxxxx> · Sat, 13 Aug 2016 17:26:27 +0200

Hi Blade,

I was planning to do something similar. Run the OSD in the way you
describe, use object copy to copy the data to a new volume, then move
the clients to the new volume.

Thanks a lot,

- HP

On Sat, 2016-08-13 at 08:18 -0700, Blade Doyle wrote:
> Hi HP.
> 
> Mine was not really a fix, it was just a hack to get the OSD up long
> enough to make sure I had a full backup, then I rebuilt the cluster
> from scratch and restored the data.  Though the hack did stop the OSD
> from crashing, it is probably a symptom of some internal problem, and
> may not be "safe" to run like that in the long term.
> 
> The change was something like this:
> 
> Ref:  https://github.com/ceph/ceph/blob/master/src/osd/ReplicatedPG.c
> c
> 
> I changed this:
> 
> ObjectContextRef obc = get_object_context(oid, false); assert(obc);
> --ctx->delta_stats.num_objects; --ctx-
> >delta_stats.num_objects_hit_set_archive; ctx->delta_stats.num_bytes
> -= obc->obs.oi.size; ctx->delta_stats.num_bytes_hit_set_archive -=
> obc->obs.oi.size;
> 
> to this:
> 
> ObjectContextRef obc = 0; // get_object_context(oid, false);
> assert(obc); --ctx->delta_stats.num_objects; --ctx-
> >delta_stats.num_objects_hit_set_archive;
> if( obc)
> {
>  ctx->delta_stats.num_bytes -= obc->obs.oi.size;
>  ctx->delta_stats.num_bytes_hit_set_archive -= obc->obs.oi.size;
> }
> 
> 
> Good luck!
> Blade.
> 
> 
> On Sat, Aug 13, 2016 at 5:52 AM, Hein-Pieter van Braam <hp@xxxxxx>
> wrote:
> > Hi Blade,
> > 
> > I appear to be stuck in the same situation you were in. Do you
> > still
> > happen to have a patch to implement this workaround you described?
> > 
> > Thanks,
> > 
> > - HP
> > 
> 
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com