Try restarting the primary osd for that pg with osd_find_best_info_ignore_history_les set to true (don't leave it set long term). -Sam On Tue, May 17, 2016 at 7:50 AM, Hein-Pieter van Braam <hp@xxxxxx> wrote: > Hello, > > Today we had a power failure in a rack housing our OSD servers. We had > 7 of our 30 total OSD nodes down. Of the affect PG 2 out of the 3 OSDs > went down. > > After everything was back and mostly healthy I found one placement > group marked as incomplete. I can't figure out why. > > I'm running ceph 0.94.6 on CentOS7. The following steps have been tried > in this order: > > 1) Reduce the min_size from 2 to 1 (as suggested by ceph health detail) > 2) Set the 2 OSDs that were down to 'out' (one by one) and waited for > the cluster to recover. (this did not work, I set them back in) > 3) use ceph-objectstore-tool to export the pg from the 2 osds that went > down, then removed it, restarted the osds. > 4) When this did not work, import the data exported from the unaffected > OSD into the two remaining osds. > 5) Import the data from the unaffected OSD into all osds that are noted > in "probing_osds" > > None of these had any effect on the stuck incomplete PG. I have > attached the output of "ceph pg 54.3e9 query", "ceph health detail", as > well as "ceph -s" > > The pool in question is largely read-only (it is an openstack rbd image > pool) so I can leave it like this for the time being. Help would be > very much appreciated! > > Thank you, > > - Hein-Pieter van Braam > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com