Le 15/07/2015 10:55, Jelle de Jong a écrit : > On 13/07/15 15:40, Jelle de Jong wrote: >> I was testing a ceph cluster with osd_pool_default_size = 2 and while >> rebuilding the OSD on one ceph node a disk in an other node started >> getting read errors and ceph kept taking the OSD down, and instead of me >> executing ceph osd set nodown while the other node was rebuilding I kept >> restarting the OSD for a while and ceph took the OSD in for a few >> minutes and then taking it back down. >> >> I then removed the bad OSD from the cluster and later added it back in >> with nodown flag set and a weight of zero, moving all the data away. >> Then removed the OSD again and added a new OSD with a new hard drive. >> >> However I ended up with the following cluster status and I can't seem to >> find how to get the cluster healthy again. I'm doing this as tests >> before taking this ceph configuration in further production. >> >> http://paste.debian.net/plain/281922 >> >> If I lost data, my bad, but how could I figure out in what pool the data >> was lost and in what rbd volume (so what kvm guest lost data). > Anybody that can help? > > Can I somehow reweight some OSD to resolve the problems? Or should I > rebuild the whole cluster and loose all data? If your min_size is 2, try setting to 1 and restart each of your OSD. If ceph -s doesn't show any progress repairing your data, you'll have to either get developpers to help savage what can be from your disks or rebuild the cluster with size=3 and restore your data. Lionel _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com