Re: ceph-osd not starting after network related issues

Ian Coetzee <ceph@xxxxxxxxxxxxxxxxx> · Wed, 3 Jul 2019 08:52:18 +0200

Hi All,

Some feedback on my end. I managed
 to recover the "lost data" from one of the other OSDs. Seems like my 
initial summary was a bit off, in that the PG's was replicated, CEPH 
just wanted to confirm that the objects were still relevant.

For future reference, I basically marked the OSD as lost

> ceph osd lost <id>

Then the PGs went into an incomplete state

After that I temporarily set an option on the OSDs to ignore the history (osd_find_best_info_ignore_history_les). Got the info from http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-March/017270.html

After that CEPH was happy and started to rebalance the cluster, pheew, crisis averted.

This failure did however convince me to increase our cluster size from 2:1 to 3:2. Sacrificing usable space for reliability.

Now
 I need to give feedback on what happened, this is what I am still not 
sure about as SMART does not show any sector errors. I might as well 
start a badblocks and see if I detect anything in there.

As always, I am open to other suggestion as to where to look for other clues on what went wrong.

Kind regards

On Mon, 1 Jul 2019 at 09:31, Ian Coetzee <ceph@xxxxxxxxxxxxxxxxx> wrote:
Hi Guys,

This is a cross-post from the proxmox ML.

This morning I have a bit of a big boo-boo on our production system.

After a very sudden network outage somewhere during the night, one of my ceph-osd's is no longer starting up.

If I try and start it manually, I get a very spectacular failure, see link.
https://www.jacklin.co.za/zerobin/?04e2dcd13ab8dfc8#zKCISUvAm4o/6mnLmyu+8fSS1VumC65XaETt/dD7rn0=

As
 near as I can tell, it seems to be asserting whether a file exsists, I 
have yet to determine which file that would be. Any pointers are 
welcome, as well as any other ideas to get the osd back. For some reason
 there is data on the osd that was not replicated to my other osd's, as 
such I can not just re-init this osd as some of the posts I could find 
suggests

Kind regards

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com