If all of your PGs now have an empty down_osds_we_would_probe, I'd run through this discussion again. The commands to tell Ceph to give up on lost data should have an effect now.
That's my experience anyway. Nothing progressed until I took care of down_osds_we_would_probe. After that was empty, I was able to repair. It wasn't immediate though. It still took ~24 hours, and a few OSD restarts, for the cluster to get itself healthy. You might try sequentially restarting OSDs. It shouldn't be necessary, but it shouldn't make anything worse.
On Mon, Nov 10, 2014 at 7:17 AM, Chad Seys <cwseys@xxxxxxxxxxxxxxxx> wrote:
Hi Craig and list,
> > > If you create a real osd.20, you might want to leave it OUT until you
> > > get things healthy again.
I created a real osd.20 (and it turns out I needed an osd.21 also).
ceph pg x.xx query no longer lists down osds for probing:
"down_osds_we_would_probe": [],
But I cannot find the magic command line which will remove these incomplete
PGs.
Anyone know how to remove incomplete PGs ?
Thanks!
Chad.
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com