On 15/07/15 10:55, Jelle de Jong wrote: > On 13/07/15 15:40, Jelle de Jong wrote: >> I was testing a ceph cluster with osd_pool_default_size = 2 and while >> rebuilding the OSD on one ceph node a disk in an other node started >> getting read errors and ceph kept taking the OSD down, and instead of me >> executing ceph osd set nodown while the other node was rebuilding I kept >> restarting the OSD for a while and ceph took the OSD in for a few >> minutes and then taking it back down. >> >> I then removed the bad OSD from the cluster and later added it back in >> with nodown flag set and a weight of zero, moving all the data away. >> Then removed the OSD again and added a new OSD with a new hard drive. >> >> However I ended up with the following cluster status and I can't seem to >> find how to get the cluster healthy again. I'm doing this as tests >> before taking this ceph configuration in further production. >> >> http://paste.debian.net/plain/281922 >> >> If I lost data, my bad, but how could I figure out in what pool the data >> was lost and in what rbd volume (so what kvm guest lost data). > > Anybody that can help? > > Can I somehow reweight some OSD to resolve the problems? Or should I > rebuild the whole cluster and loose all data? # ceph pg 3.12 query http://paste.debian.net/284812/ I used ceph pg force_create_pg x.xx on all the incomplete pgs and I don’t have any stuck pgs any more but there are still incomplete ones. # ceph health detail http://paste.debian.net/284813/ How can I get the incomplete pgs active again? Kind regards, Jelle de Jong _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com