Re: Luminous OSD crashes every few seconds: FAILED assert(0 == "past_interval end mismatch")

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



If you don't already know why, you should investigate why your cluster
could not recover after the loss of a single osd.

Your solution seems valid given your description.


On Thu, Aug 2, 2018 at 12:15 PM, J David <j.david.lists@xxxxxxxxx> wrote:
> On Wed, Aug 1, 2018 at 9:53 PM, Brad Hubbard <bhubbard@xxxxxxxxxx> wrote:
>> What is the status of the cluster with this osd down and out?
>
> Briefly, miserable.
>
> All client IO was blocked.
>
> 36 pgs were stuck “down.”  pg query reported that they were blocked by
> that OSD, despite that OSD not holding any replicas for them, with
> diagnostics (now gone off of scrollback, sorry) about how bringing
> that OSD online or marking it lost might resolve the issue.
>
> With blocked IO and pgs stuck “down” I was not at all comfortable
> marking the OSD lost.
>
> Both conditions resolved after taking the steps outlined in the post I
> just made to ceph-users.
>
> Thanks!
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Cheers,
Brad
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux