Re: recover from node failure / monitor and osds do not come back

Gregory Farnum <greg@xxxxxxxxxxx> · Wed, 26 Feb 2014 09:15:04 -0800



Your OSDs aren't supposed to be listed in the config file, but they
should show up under /var/lib/ceph. Probably your OSD disks aren't
being mounted for some reason (that would be the bug). Try mounting
them and seeing what blocked the mount.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


On Wed, Feb 26, 2014 at 4:05 AM, Diedrich Ehlerding
<diedrich.ehlerding@xxxxxxxxxxxxxx> wrote:
> My configuration is: two osd servers, one admin node, three monitors;
> all running 072.2
>
> I had to switch of one of the OSD servers. The ngood news is: As
> expected, all clients survived and continued to work with the
> cluster, and the cluster entered a "health warn" state (one monitor
> down, 5 of 10 osds down).
>
> The bad news is: I cannot resume this server's operation. When I
> booted the server, the monitor was started automatically - but it did
> not join the cluster. "/etc/init.d/ceph start mon" says "already
> running", but ceph -s still says that one monitor (this one)  is
> missing.
>
> And the OSDs do not come back; nor can I restart them; error message
> is:
>
> # /etc/init.d/ceph start osd.0
> /etc/init.d/ceph: osd.0 not found (/etc/ceph/ceph.conf defines
> mon.hvrrzrx301 , /var/lib/ceph defines mon.hvrrzrx301)
>
> As expected, ceph osd tree display the osd as down:
>
> -1      2.7     root default
> -2      1.35            host hvrrzrx301
> 0       0.27                    osd.0   down    0
> 2       0.27                    osd.2   down    0
> 4       0.27                    osd.4   down    0
> 6       0.27                    osd.6   down    0
> 8       0.27                    osd.8   down    0
> -3      1.35            host hvrrzrx303
> 1       0.27                    osd.1   up      1
> 3       0.27                    osd.3   up      1
> 5       0.27                    osd.5   up      1
> 7       0.27                    osd.7   up      1
> 9       0.27                    osd.9   up      1
>
>
> my ceph.conf only contains those settings which "ceph-deploy new "
> installed there; i.e. the osds are not mentioned in ceph.conf. I
> assume that this is the problem with my osds? Apparently the cluster
> (the surviving monitors) still know that osd.0, osd.2 etc. should
> appear in the failed node.
>
> Alas, I couldnt find any descritpion how to configure osds within
> ceph.conf ... I tried to define
> [osd.0]
> host = my_server
> devs = /dev/sdb2
> data = /var/lib/ceph/osd/ceph-0
>
> but now it complains now that no filesystem type is defined ...
>
> To summarize: where can I find rules and procedures how to set up a
> ceph.conf not only by ceph-deploy; what must I do in addition to
> ceph-deploy in order that I can survive a node outage and can
> reattach the node to the cluster, with respect to the monitor on that
> node as well as to the osds?
>
>
> best regards
> --
> Diedrich Ehlerding, Fujitsu Technology Solutions GmbH,
> FTS CE SC PS&IS W, Hildesheimer Str 25, D-30880 Laatzen
> Fon +49 511 8489-1806, Fax -251806, Mobil +49 173 2464758
> Firmenangaben: http://de.ts.fujitsu.com/imprint.html
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com