Re: OSD won't go up after node reboot

Gregory Farnum <gfarnum@xxxxxxxxxx> · Mon, 31 Aug 2015 09:50:46 +0100



On Sat, Aug 29, 2015 at 3:32 PM, Евгений Д. <ineu.main@xxxxxxxxx> wrote:
> I'm running 3-node cluster with Ceph (it's Deis cluster, so Ceph daemons are
> containerized). There are 3 OSDs and 3 mons. After rebooting all nodes one
> by one all monitors are up, but only two OSDs of three are up. 'Down' OSD is
> really running but is never marked up/in.
> All three mons are reachable from inside the OSD container.
> I've run `log dump` for this OSD and found this line:
>
> Aug 29 06:19:39 staging-coreos-1 sh[7393]: -99> 2015-08-29 06:18:51.855432
> 7f5902009700  3 osd.0 0 handle_osd_map epochs [1,90], i have 0, src has
> [1,90]
>
> Is it the reason why OSD cannot connect to the cluster? If yes, why could it
> happen? I haven't removed any data from /var/lib/ceph/osd.
> Is it possible to bring this OSD back to cluster without completely
> recreating it?
>
> Ceph version is:
>
> root@staging-coreos-1:/# ceph -v
> ceph version 0.94.2 (5fb85614ca8f354284c713a2f9c610860720bbf3)

It's pretty unlikely. I presume (since the OSD has no maps) that it's
never actually been up and in the cluster? Or else its data store has
been pretty badly corrupted since it doesn't have any of the requisite
metadata. In which case you'll probably be best off recreating it
(with 3 OSDs I assume all your PGs are still active).
-Greg
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com