Re: [ceph-users] Re: osdmaps not trimmed until ceph-mon's restarted (if cluster has a down osd)

Dan van der Ster <dan@xxxxxxxxxxxxxx> · Mon, 18 Nov 2019 16:12:51 +0100

On Fri, Nov 15, 2019 at 4:45 PM Joao Eduardo Luis <joao@xxxxxxx> wrote:
>
> On 19/11/14 11:04AM, Gregory Farnum wrote:
> > On Thu, Nov 14, 2019 at 8:14 AM Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote:
> > >
> > > Hi Joao,
> > >
> > > I might have found the reason why several of our clusters (and maybe
> > > Bryan's too) are getting stuck not trimming osdmaps.
> > > It seems that when an osd fails, the min_last_epoch_clean gets stuck
> > > forever (even long after HEALTH_OK), until the ceph-mons are
> > > restarted.
> > >
> > > I've updated the ticket: https://tracker.ceph.com/issues/41154
> >
> > Wrong ticket, I think you meant https://tracker.ceph.com/issues/37875#note-7
>
> I've seen this behavior a long, long time ago, but stopped being able to
> reproduce it consistently enough to ensure the patch was working properly.
>
> I think I have a patch here:
>
>   https://github.com/ceph/ceph/pull/19076/commits
>
> If you are feeling adventurous, and want to give it a try, let me know. I'll
> be happy to forward port it to whatever you are running.

Thanks Joao, this patch is what I had in mind.

I'm trying to evaluate how adventurous this would be -- Is there any
risk that if a huge number of osds are down all at once (but
transiently), it would trigger the mon to trim too many maps?
I would expect that the remaining up OSDs will have a safe, low, osd_epoch ?

And anyway I guess that your proposed get_min_last_epoch_clean patch
is equivalent to what we have today if we restart the ceph-mon leader
while an osd is down.

-- Dan
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx