On 13/07/15 15:40, Jelle de Jong wrote: > I was testing a ceph cluster with osd_pool_default_size = 2 and while > rebuilding the OSD on one ceph node a disk in an other node started > getting read errors and ceph kept taking the OSD down, and instead of me > executing ceph osd set nodown while the other node was rebuilding I kept > restarting the OSD for a while and ceph took the OSD in for a few > minutes and then taking it back down. > > I then removed the bad OSD from the cluster and later added it back in > with nodown flag set and a weight of zero, moving all the data away. > Then removed the OSD again and added a new OSD with a new hard drive. > > However I ended up with the following cluster status and I can't seem to > find how to get the cluster healthy again. I'm doing this as tests > before taking this ceph configuration in further production. > > http://paste.debian.net/plain/281922 > > If I lost data, my bad, but how could I figure out in what pool the data > was lost and in what rbd volume (so what kvm guest lost data). Anybody that can help? Can I somehow reweight some OSD to resolve the problems? Or should I rebuild the whole cluster and loose all data? Kind regards, Jelle de Jong _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com