Re: objects degraded higher than 100%

Gregory Farnum <gfarnum@xxxxxxxxxx> · Fri, 13 Oct 2017 16:17:01 +0000

On Fri, Oct 13, 2017 at 7:57 AM Florian Haas <florian@xxxxxxxxxxx> wrote:
On Thu, Oct 12, 2017 at 7:56 PM, Gregory Farnum <gfarnum@xxxxxxxxxx> wrote:

>

>

> On Thu, Oct 12, 2017 at 10:52 AM Florian Haas <florian@xxxxxxxxxxx> wrote:

>>

>> On Thu, Oct 12, 2017 at 7:22 PM, Gregory Farnum <gfarnum@xxxxxxxxxx>

>> wrote:

>> >

>> >

>> > On Thu, Oct 12, 2017 at 3:50 AM Florian Haas <florian@xxxxxxxxxxx>

>> > wrote:

>> >>

>> >> On Mon, Sep 11, 2017 at 8:13 PM, Andreas Herrmann <andreas@xxxxxxxx>

>> >> wrote:

>> >> > Hi,

>> >> >

>> >> > how could this happen:

>> >> >

>> >> >         pgs: 197528/1524 objects degraded (12961.155%)

>> >> >

>> >> > I did some heavy failover tests, but a value higher than 100% looks

>> >> > strange

>> >> > (ceph version 12.2.0). Recovery is quite slow.

>> >> >

>> >> >   cluster:

>> >> >     health: HEALTH_WARN

>> >> >             3/1524 objects misplaced (0.197%)

>> >> >             Degraded data redundancy: 197528/1524 objects degraded

>> >> > (12961.155%), 1057 pgs unclean, 1055 pgs degraded, 3 pgs undersized

>> >> >

>> >> >   data:

>> >> >     pools:   1 pools, 2048 pgs

>> >> >     objects: 508 objects, 1467 MB

>> >> >     usage:   127 GB used, 35639 GB / 35766 GB avail

>> >> >     pgs:     197528/1524 objects degraded (12961.155%)

>> >> >              3/1524 objects misplaced (0.197%)

>> >> >              1042 active+recovery_wait+degraded

>> >> >              991  active+clean

>> >> >              8    active+recovering+degraded

>> >> >              3    active+undersized+degraded+remapped+backfill_wait

>> >> >              2    active+recovery_wait+degraded+remapped

>> >> >              2    active+remapped+backfill_wait

>> >> >

>> >> >   io:

>> >> >     recovery: 340 kB/s, 80 objects/s

>> >>

>> >> Did you ever get to the bottom of this? I'm seeing something very

>> >> similar on a 12.2.1 reference system:

>> >>

>> >> https://gist.github.com/fghaas/f547243b0f7ebb78ce2b8e80b936e42c

>> >>

>> >> I'm also seeing an unusual MISSING_ON_PRIMARY count in "rados df":

>> >> https://gist.github.com/fghaas/59cd2c234d529db236c14fb7d46dfc85

>> >>

>> >> The odd thing in there is that the "bench" pool was empty when the

>> >> recovery started (that pool had been wiped with "rados cleanup"), so

>> >> the number of objects deemed to be missing from the primary really

>> >> ought to be zero.

>> >>

>> >> It seems like it's considering these deleted objects to still require

>> >> replication, but that sounds rather far fetched to be honest.

>> >

>> >

>> > Actually, that makes some sense. This cluster had an OSD down while

>> > (some

>> > of) the deletes were happening?

>>

>> I thought of exactly that too, but no it didn't. That's the problem.

>

>

> Okay, in that case I've no idea. What was the timeline for the recovery

> versus the rados bench and cleanup versus the degraded object counts, then?

1. Jewel deployment with filestore.

2. Upgrade to Luminous (including mgr deployment and "ceph osd

require-osd-release luminous"), still on filestore.

3. rados bench with subsequent cleanup.

4. All OSDs up, all  PGs active+clean.

5. Stop one OSD. Remove from CRUSH, auth list, OSD map.

6. Reinitialize OSD with bluestore.

7. Start OSD, commencing backfill.

8. Degraded objects above 100%.

Please let me know if that information is useful. Thank you!

Hmm, that does leave me a little perplexed.

David, do we maybe do something with degraded counts based on the number of objects identified in pg logs? Or some other heuristic for number of objects that might be stale? That's the only way I can think of to get these weird returning sets.
-Greg
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com