Re: Understanding incomplete PGs

Kyle <aradian@xxxxxxxxx> · Fri, 05 Jul 2019 14:07:49 -0500

On Friday, July 5, 2019 11:28:32 AM CDT Caspar Smit wrote:
> Kyle,
> 
> Was the cluster still backfilling when you removed osd 6 or did you only
> check its utilization?

Yes, still backfilling.

> 
> Running an EC pool with m=1 is a bad idea. EC pool min_size = k+1 so losing
> a single OSD results in inaccessible data.
> Your incomplete PG's are probably all EC pool pgs, please verify.

Yes, also correct.

> 
> If the above statement is true, you could *temporarily* set min_size to 2
> (on your EC pools) to get back access to your data again but this is a very
> dangerous action. Losing another OSD during this period results in actual
> data loss.

This resolved the issue. I had seen reducing min_size mentioned elsewhere, but 
for some reason I thought that applied only to replicated pools. Thank you!

> 
> Kind regards,
> Caspar Smit
> 
> Op vr 5 jul. 2019 om 01:17 schreef Kyle <aradian@xxxxxxxxx>:
> > Hello,
> > 
> > I'm working with a small ceph cluster (about 10TB, 7-9 OSDs, all Bluestore
> > on
> > lvm) and recently ran into a problem with 17 pgs marked as incomplete
> > after
> > adding/removing OSDs.
> > 
> > Here's the sequence of events:
> > 1. 7 osds in the cluster, health is OK, all pgs are active+clean
> > 2. 3 new osds on a new host are added, lots of backfilling in progress
> > 3. osd 6 needs to be removed, so we do "ceph osd crush reweight osd.6 0"
> > 4. after a few hours we see "min osd.6 with 0 pgs" from "ceph osd
> > utilization"
> > 5. ceph osd out 6
> > 6. systemctl stop ceph-osd@6
> > 7. the drive backing osd 6 is pulled and wiped
> > 8. backfilling has now finished all pgs are active+clean except for 17
> > incomplete pgs
> > 
> > From reading the docs, it sounds like there has been unrecoverable data
> > loss
> > in those 17 pgs. That raises some questions for me:
> > 
> > Was "ceph osd utilization" only showing a goal of 0 pgs allocated instead
> > of
> > the current actual allocation?
> > 
> > Why is there data loss from a single osd being removed? Shouldn't that be
> > recoverable?
> > All pools in the cluster are either replicated 3 or erasure-coded k=2,m=1
> > with
> > default "host" failure domain. They shouldn't suffer data loss with a
> > single
> > osd being removed even if there were no reweighting beforehand. Does the
> > backfilling temporarily reduce data durability in some way?
> > 
> > Is there a way to see which pgs actually have data on a given osd?
> > 
> > I attached an example of one of the incomplete pgs.
> > 
> > Thanks for any help,
> > 
> > Kyle_______________________________________________
> > ceph-users mailing list
> > ceph-users@xxxxxxxxxxxxxx
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com