Re: HEALTH_WARNING

Wido den Hollander <wido@xxxxxxxxx> · Sat, 02 Apr 2011 10:22:52 +0200

Hi,

On Sat, 2011-04-02 at 05:59 +0200, Martin Wilderoth wrote:
> Hello,
> 
> One of my hosts run out of diskspace on the root file system (logfiles)
> So I restared ceph. Discoverd the low diskspace during the restart. osd2 and osd3
> 

Do you have separate partitions for your OSD data? Or do you have one
big / partition? I'd recommend a separate partition for your OSD's. 

> ceph health gives a message like this
> 
> HEALTH_WARN osdmonitor: num_osds = 6, num_up_osds = 4, num_in_osds = 4 Some PGs are: degraded,peering
> 
> now osd.1 is dead all the other are running
> 
> How do I get the running one up and in ? and how do I know which ods it is ?
> 

$ ceph osd dump -o -

That should tell you which OSD is down/out.

> how do I recover the dead one ?
> 

Normally starting the OSD would be enough. Look closely though, you
might have hit a bug which caused the OSD to crash. If so, there should
be a file called "core" in / which has a core-dump and could tell why
the OSD crashed:

$ gdb /usr/bin/cosd /core

Make sure you have the debug symbols (-dbg packages) installed when
doing so.

If you monitor 'ceph -w' then, you should see the cluster recover and
all OSD's should be up & in.

Wido

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html