Re: PG stuck incomplete after power failure.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

Thank you so much!

This fixed my issue completely, minus one image that was apparently
being uploaded while the rack lost power.

Is there anything I can do to prevent this from happening in the
future, or a way to detect this issue?

I've looked online for an explanation of exactly what this flag does,
but it appears to be somewhat poorly documented. Can you point me to
some documentation about this flag, and others like it? I'd like to
learn more!

Again, thank you so much for your time.

- HP

On Tue, 2016-05-17 at 14:08 -0700, Samuel Just wrote:
> Try restarting the primary osd for that pg with
> osd_find_best_info_ignore_history_les set to true (don't leave it set
> long term).
> -Sam
> 
> On Tue, May 17, 2016 at 7:50 AM, Hein-Pieter van Braam <hp@xxxxxx>
> wrote:
> > 
> > Hello,
> > 
> > Today we had a power failure in a rack housing our OSD servers. We
> > had
> > 7 of our 30 total OSD nodes down. Of the affect PG 2 out of the 3
> > OSDs
> > went down.
> > 
> > After everything was back and mostly healthy I found one placement
> > group marked as incomplete. I can't figure out why.
> > 
> > I'm running ceph 0.94.6 on CentOS7. The following steps have been
> > tried
> > in this order:
> > 
> > 1) Reduce the min_size from 2 to 1 (as suggested by ceph health
> > detail)
> > 2) Set the 2 OSDs that were down to 'out' (one by one) and waited
> > for
> > the cluster to recover. (this did not work, I set them back in)
> > 3) use ceph-objectstore-tool to export the pg from the 2 osds that
> > went
> > down, then removed it, restarted the osds.
> > 4) When this did not work, import the data exported from the
> > unaffected
> > OSD into the two remaining osds.
> > 5) Import the data from the unaffected OSD into all osds that are
> > noted
> > in "probing_osds"
> > 
> > None of these had any effect on the stuck incomplete PG. I have
> > attached the output of "ceph pg 54.3e9 query", "ceph health
> > detail", as
> > well as "ceph -s"
> > 
> > The pool in question is largely read-only (it is an openstack rbd
> > image
> > pool) so I can leave it like this for the time being. Help would be
> > very much appreciated!
> > 
> > Thank you,
> > 
> > - Hein-Pieter van Braam
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@xxxxxxxxxxxxxx
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > 
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux