Re: Understanding incomplete PGs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi

The "ec unable to recover when below min size" thing has very recently been fixed for octopus.

See https://tracker.ceph.com/issues/18749 and https://github.com/ceph/ceph/pull/17619


Docs has been updated with a section on this issue http://docs.ceph.com/docs/master/rados/operations/erasure-code/#erasure-coded-pool-recover


/Torben


On 05.07.2019 11:50, Paul Emmerich wrote:

* There are virtually no use cases for ec pools with m=1, this is a bad configuration as you can't have both availability and durability
 
* Due to weird internal restrictions ec pools below their min size can't recover, you'll probably have to reduce min_size temporarily to recover it
 
 
* Depending on your version it might be necessary to restart some of the OSDs due to a bug (fixed by now) that caused it to mark some objects as degraded if you remove or restart an OSD while you have remapped objects
 
* run "ceph osd safe-to-destroy X" to check if it's safe to destroy a given OSD
 
 
 
 
--
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90

On Fri, Jul 5, 2019 at 1:17 AM Kyle <aradian@xxxxxxxxx> wrote:
Hello,

I'm working with a small ceph cluster (about 10TB, 7-9 OSDs, all Bluestore on
lvm) and recently ran into a problem with 17 pgs marked as incomplete after
adding/removing OSDs.

Here's the sequence of events:
1. 7 osds in the cluster, health is OK, all pgs are active+clean
2. 3 new osds on a new host are added, lots of backfilling in progress
3. osd 6 needs to be removed, so we do "ceph osd crush reweight osd.6 0"
4. after a few hours we see "min osd.6 with 0 pgs" from "ceph osd utilization"
5. ceph osd out 6
6. systemctl stop ceph-osd@6
7. the drive backing osd 6 is pulled and wiped
8. backfilling has now finished all pgs are active+clean except for 17
incomplete pgs

From reading the docs, it sounds like there has been unrecoverable data loss
in those 17 pgs. That raises some questions for me:

Was "ceph osd utilization" only showing a goal of 0 pgs allocated instead of
the current actual allocation?

Why is there data loss from a single osd being removed? Shouldn't that be
recoverable?
All pools in the cluster are either replicated 3 or erasure-coded k=2,m=1 with
default "host" failure domain. They shouldn't suffer data loss with a single
osd being removed even if there were no reweighting beforehand. Does the
backfilling temporarily reduce data durability in some way?

Is there a way to see which pgs actually have data on a given osd?

I attached an example of one of the incomplete pgs.

Thanks for any help,

Kyle_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux