If you don't already know why, you should investigate why your cluster could not recover after the loss of a single osd. Your solution seems valid given your description. On Thu, Aug 2, 2018 at 12:15 PM, J David <j.david.lists@xxxxxxxxx> wrote: > On Wed, Aug 1, 2018 at 9:53 PM, Brad Hubbard <bhubbard@xxxxxxxxxx> wrote: >> What is the status of the cluster with this osd down and out? > > Briefly, miserable. > > All client IO was blocked. > > 36 pgs were stuck “down.” pg query reported that they were blocked by > that OSD, despite that OSD not holding any replicas for them, with > diagnostics (now gone off of scrollback, sorry) about how bringing > that OSD online or marking it lost might resolve the issue. > > With blocked IO and pgs stuck “down” I was not at all comfortable > marking the OSD lost. > > Both conditions resolved after taking the steps outlined in the post I > just made to ceph-users. > > Thanks! > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Cheers, Brad _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com