Hello, I have seperate partitions for my osd and the btrfs file system. I also use SSD-disk for journaling. But I got problem when the root system was filled up with logfiles on one host, the file system reported out of diskspace. But the osd's were not filled to 100%. Later I realised that the root system on one of the osd hosts (osd2 and osd3) had no space left, to much logging. The only way I know to recover is to create a new filesystem in the cluster :-) But it's bad fot the data :-) When i get problems with one osd it seems as if they are crashing one by one. And i dont know how to get them up again whitout deleting all the data. ÂÂ Hi, On Sat, 2011-04-02 at 05:59 +0200, Martin Wilderoth wrote: > Hello, > > One of my hosts run out of diskspace on the root file system (logfiles) > So I restared ceph. Discoverd the low diskspace during the restart. osd2 and osd3 > Do you have separate partitions for your OSD data? Or do you have one big / partition? I'd recommend a separate partition for your OSD's. > ceph health gives a message like this > > HEALTH_WARN osdmonitor: num_osds = 6, num_up_osds = 4, num_in_osds = 4 Some PGs are: degraded,peering > > now osd.1 is dead all the other are running > > How do I get the running one up and in ? and how do I know which ods it is ? > $ ceph osd dump -o - That should tell you which OSD is down/out. > how do I recover the dead one ? > Normally starting the OSD would be enough. Look closely though, you might have hit a bug which caused the OSD to crash. If so, there should be a file called "core" in / which has a core-dump and could tell why the OSD crashed: $ gdb /usr/bin/cosd /core Make sure you have the debug symbols (-dbg packages) installed when doing so. If you monitor 'ceph -w' then, you should see the cluster recover and all OSD's should be up & in. Wido -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html