Re: objects degraded higher than 100%

Florian Haas <florian@xxxxxxxxxxx> · Thu, 12 Oct 2017 19:51:24 +0200

On Thu, Oct 12, 2017 at 7:22 PM, Gregory Farnum <gfarnum@xxxxxxxxxx> wrote:
>
>
> On Thu, Oct 12, 2017 at 3:50 AM Florian Haas <florian@xxxxxxxxxxx> wrote:
>>
>> On Mon, Sep 11, 2017 at 8:13 PM, Andreas Herrmann <andreas@xxxxxxxx>
>> wrote:
>> > Hi,
>> >
>> > how could this happen:
>> >
>> >         pgs: 197528/1524 objects degraded (12961.155%)
>> >
>> > I did some heavy failover tests, but a value higher than 100% looks
>> > strange
>> > (ceph version 12.2.0). Recovery is quite slow.
>> >
>> >   cluster:
>> >     health: HEALTH_WARN
>> >             3/1524 objects misplaced (0.197%)
>> >             Degraded data redundancy: 197528/1524 objects degraded
>> > (12961.155%), 1057 pgs unclean, 1055 pgs degraded, 3 pgs undersized
>> >
>> >   data:
>> >     pools:   1 pools, 2048 pgs
>> >     objects: 508 objects, 1467 MB
>> >     usage:   127 GB used, 35639 GB / 35766 GB avail
>> >     pgs:     197528/1524 objects degraded (12961.155%)
>> >              3/1524 objects misplaced (0.197%)
>> >              1042 active+recovery_wait+degraded
>> >              991  active+clean
>> >              8    active+recovering+degraded
>> >              3    active+undersized+degraded+remapped+backfill_wait
>> >              2    active+recovery_wait+degraded+remapped
>> >              2    active+remapped+backfill_wait
>> >
>> >   io:
>> >     recovery: 340 kB/s, 80 objects/s
>>
>> Did you ever get to the bottom of this? I'm seeing something very
>> similar on a 12.2.1 reference system:
>>
>> https://gist.github.com/fghaas/f547243b0f7ebb78ce2b8e80b936e42c
>>
>> I'm also seeing an unusual MISSING_ON_PRIMARY count in "rados df":
>> https://gist.github.com/fghaas/59cd2c234d529db236c14fb7d46dfc85
>>
>> The odd thing in there is that the "bench" pool was empty when the
>> recovery started (that pool had been wiped with "rados cleanup"), so
>> the number of objects deemed to be missing from the primary really
>> ought to be zero.
>>
>> It seems like it's considering these deleted objects to still require
>> replication, but that sounds rather far fetched to be honest.
>
>
> Actually, that makes some sense. This cluster had an OSD down while (some
> of) the deletes were happening?

I thought of exactly that too, but no it didn't. That's the problem.

> I haven't dug through the code but I bet it is considering those as degraded
> objects because the out-of-date OSD knows it doesn't have the latest
> versions on them! :)

Yeah I bet against that. :)

Another tidbit: these objects were not deleted with rados rm, they
were cleaned up after rados bench. In the case quoted above, this was
an explicit "rados cleanup" after "rados bench --no-cleanup"; in
another, I saw the same behavior after a regular "rados bench" that
included the automatic cleanup.

So there are two hypotheses here:
(1) The deletion in rados bench is neglecting to do something that a
regular object deletion does do. Given the fact that at least one
other thing is fishy in rados bench
(http://tracker.ceph.com/issues/21375), this may be due to some simple
oversight in the Luminous cycle, and thus would constitute a fairly
minor (if irritating) issue.
(2) Regular object deletion is buggy in some previously unknown
fashion. That would would be a rather major problem.

By the way, *deleting the pool* altogether makes the degraded object
count drop to expected levels immediately. Probably no surprise there,
though.

Cheers,
Florian
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com