Re: Luminous OSD crashes every few seconds: FAILED assert(0 == "past_interval end mismatch")

Brad Hubbard <bhubbard@xxxxxxxxxx> · Thu, 2 Aug 2018 12:20:12 +1000

If you don't already know why, you should investigate why your cluster
could not recover after the loss of a single osd.

Your solution seems valid given your description.

On Thu, Aug 2, 2018 at 12:15 PM, J David <j.david.lists@xxxxxxxxx> wrote:
> On Wed, Aug 1, 2018 at 9:53 PM, Brad Hubbard <bhubbard@xxxxxxxxxx> wrote:
>> What is the status of the cluster with this osd down and out?
>
> Briefly, miserable.
>
> All client IO was blocked.
>
> 36 pgs were stuck “down.”  pg query reported that they were blocked by
> that OSD, despite that OSD not holding any replicas for them, with
> diagnostics (now gone off of scrollback, sorry) about how bringing
> that OSD online or marking it lost might resolve the issue.
>
> With blocked IO and pgs stuck “down” I was not at all comfortable
> marking the OSD lost.
>
> Both conditions resolved after taking the steps outlined in the post I
> just made to ceph-users.
>
> Thanks!
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 
Cheers,
Brad
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com