Re: objects degraded higher than 100%

Gregory Farnum <gfarnum@xxxxxxxxxx> · Thu, 12 Oct 2017 17:56:50 +0000

On Thu, Oct 12, 2017 at 10:52 AM Florian Haas <florian@xxxxxxxxxxx> wrote:
On Thu, Oct 12, 2017 at 7:22 PM, Gregory Farnum <gfarnum@xxxxxxxxxx> wrote:

>

>

> On Thu, Oct 12, 2017 at 3:50 AM Florian Haas <florian@xxxxxxxxxxx> wrote:

>>

>> On Mon, Sep 11, 2017 at 8:13 PM, Andreas Herrmann <andreas@xxxxxxxx>

>> wrote:

>> > Hi,

>> >

>> > how could this happen:

>> >

>> >         pgs: 197528/1524 objects degraded (12961.155%)

>> >

>> > I did some heavy failover tests, but a value higher than 100% looks

>> > strange

>> > (ceph version 12.2.0). Recovery is quite slow.

>> >

>> >   cluster:

>> >     health: HEALTH_WARN

>> >             3/1524 objects misplaced (0.197%)

>> >             Degraded data redundancy: 197528/1524 objects degraded

>> > (12961.155%), 1057 pgs unclean, 1055 pgs degraded, 3 pgs undersized

>> >

>> >   data:

>> >     pools:   1 pools, 2048 pgs

>> >     objects: 508 objects, 1467 MB

>> >     usage:   127 GB used, 35639 GB / 35766 GB avail

>> >     pgs:     197528/1524 objects degraded (12961.155%)

>> >              3/1524 objects misplaced (0.197%)

>> >              1042 active+recovery_wait+degraded

>> >              991  active+clean

>> >              8    active+recovering+degraded

>> >              3    active+undersized+degraded+remapped+backfill_wait

>> >              2    active+recovery_wait+degraded+remapped

>> >              2    active+remapped+backfill_wait

>> >

>> >   io:

>> >     recovery: 340 kB/s, 80 objects/s

>>

>> Did you ever get to the bottom of this? I'm seeing something very

>> similar on a 12.2.1 reference system:

>>

>> https://gist.github.com/fghaas/f547243b0f7ebb78ce2b8e80b936e42c

>>

>> I'm also seeing an unusual MISSING_ON_PRIMARY count in "rados df":

>> https://gist.github.com/fghaas/59cd2c234d529db236c14fb7d46dfc85

>>

>> The odd thing in there is that the "bench" pool was empty when the

>> recovery started (that pool had been wiped with "rados cleanup"), so

>> the number of objects deemed to be missing from the primary really

>> ought to be zero.

>>

>> It seems like it's considering these deleted objects to still require

>> replication, but that sounds rather far fetched to be honest.

>

>

> Actually, that makes some sense. This cluster had an OSD down while (some

> of) the deletes were happening?

I thought of exactly that too, but no it didn't. That's the problem.

Okay, in that case I've no idea. What was the timeline for the recovery versus the rados bench and cleanup versus the degraded object counts, then?

> I haven't dug through the code but I bet it is considering those as degraded

> objects because the out-of-date OSD knows it doesn't have the latest

> versions on them! :)

Yeah I bet against that. :)

Another tidbit: these objects were not deleted with rados rm, they

were cleaned up after rados bench. In the case quoted above, this was

an explicit "rados cleanup" after "rados bench --no-cleanup"; in

another, I saw the same behavior after a regular "rados bench" that

included the automatic cleanup.

So there are two hypotheses here:

(1) The deletion in rados bench is neglecting to do something that a

regular object deletion does do. Given the fact that at least one

other thing is fishy in rados bench

(http://tracker.ceph.com/issues/21375), this may be due to some simple

oversight in the Luminous cycle, and thus would constitute a fairly

minor (if irritating) issue.

(2) Regular object deletion is buggy in some previously unknown

fashion. That would would be a rather major problem.

These both seem exceedingly unlikely. *shrug*

By the way, *deleting the pool* altogether makes the degraded object

count drop to expected levels immediately. Probably no surprise there,

though.

Cheers,

Florian

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com