Re: HEALTH_WARNING

Martin Wilderoth <martin.wilderoth@xxxxxxxxxx> · Sat, 2 Apr 2011 12:55:38 +0200 (CEST)

Hello,

I have seperate partitions for my osd and the btrfs file system.
I also use SSD-disk for journaling.

But I got problem when the root system was filled up with logfiles on one host,
the file system reported out of diskspace.

But the osd's were not filled to 100%. Later I realised that the root system on one of the osd hosts (osd2 and osd3) had no space left, to much logging.

The only way I know to recover is to create a new filesystem in the cluster :-)
But it's bad fot the data :-)

When i get problems with one osd it seems as if they are crashing one by one.
And i dont know how to get them up again whitout deleting all the data.
ÂÂ
Hi, 

On Sat, 2011-04-02 at 05:59 +0200, Martin Wilderoth wrote: 
> Hello, 
> 
> One of my hosts run out of diskspace on the root file system (logfiles) 
> So I restared ceph. Discoverd the low diskspace during the restart. osd2 and osd3 
> 

Do you have separate partitions for your OSD data? Or do you have one 
big / partition? I'd recommend a separate partition for your OSD's. 

> ceph health gives a message like this 
> 
> HEALTH_WARN osdmonitor: num_osds = 6, num_up_osds = 4, num_in_osds = 4 Some PGs are: degraded,peering 
> 
> now osd.1 is dead all the other are running 
> 
> How do I get the running one up and in ? and how do I know which ods it is ? 
> 

$ ceph osd dump -o - 

That should tell you which OSD is down/out. 

> how do I recover the dead one ? 
> 

Normally starting the OSD would be enough. Look closely though, you 
might have hit a bug which caused the OSD to crash. If so, there should 
be a file called "core" in / which has a core-dump and could tell why 
the OSD crashed: 

$ gdb /usr/bin/cosd /core 

Make sure you have the debug symbols (-dbg packages) installed when 
doing so. 

If you monitor 'ceph -w' then, you should see the cluster recover and 
all OSD's should be up & in. 

Wido 

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html