Re: objects degraded higher than 100%

Florian Haas <florian@xxxxxxxxxxx> · Fri, 13 Oct 2017 16:56:47 +0200

On Thu, Oct 12, 2017 at 7:56 PM, Gregory Farnum <gfarnum@xxxxxxxxxx> wrote:
>
>
> On Thu, Oct 12, 2017 at 10:52 AM Florian Haas <florian@xxxxxxxxxxx> wrote:
>>
>> On Thu, Oct 12, 2017 at 7:22 PM, Gregory Farnum <gfarnum@xxxxxxxxxx>
>> wrote:
>> >
>> >
>> > On Thu, Oct 12, 2017 at 3:50 AM Florian Haas <florian@xxxxxxxxxxx>
>> > wrote:
>> >>
>> >> On Mon, Sep 11, 2017 at 8:13 PM, Andreas Herrmann <andreas@xxxxxxxx>
>> >> wrote:
>> >> > Hi,
>> >> >
>> >> > how could this happen:
>> >> >
>> >> >         pgs: 197528/1524 objects degraded (12961.155%)
>> >> >
>> >> > I did some heavy failover tests, but a value higher than 100% looks
>> >> > strange
>> >> > (ceph version 12.2.0). Recovery is quite slow.
>> >> >
>> >> >   cluster:
>> >> >     health: HEALTH_WARN
>> >> >             3/1524 objects misplaced (0.197%)
>> >> >             Degraded data redundancy: 197528/1524 objects degraded
>> >> > (12961.155%), 1057 pgs unclean, 1055 pgs degraded, 3 pgs undersized
>> >> >
>> >> >   data:
>> >> >     pools:   1 pools, 2048 pgs
>> >> >     objects: 508 objects, 1467 MB
>> >> >     usage:   127 GB used, 35639 GB / 35766 GB avail
>> >> >     pgs:     197528/1524 objects degraded (12961.155%)
>> >> >              3/1524 objects misplaced (0.197%)
>> >> >              1042 active+recovery_wait+degraded
>> >> >              991  active+clean
>> >> >              8    active+recovering+degraded
>> >> >              3    active+undersized+degraded+remapped+backfill_wait
>> >> >              2    active+recovery_wait+degraded+remapped
>> >> >              2    active+remapped+backfill_wait
>> >> >
>> >> >   io:
>> >> >     recovery: 340 kB/s, 80 objects/s
>> >>
>> >> Did you ever get to the bottom of this? I'm seeing something very
>> >> similar on a 12.2.1 reference system:
>> >>
>> >> https://gist.github.com/fghaas/f547243b0f7ebb78ce2b8e80b936e42c
>> >>
>> >> I'm also seeing an unusual MISSING_ON_PRIMARY count in "rados df":
>> >> https://gist.github.com/fghaas/59cd2c234d529db236c14fb7d46dfc85
>> >>
>> >> The odd thing in there is that the "bench" pool was empty when the
>> >> recovery started (that pool had been wiped with "rados cleanup"), so
>> >> the number of objects deemed to be missing from the primary really
>> >> ought to be zero.
>> >>
>> >> It seems like it's considering these deleted objects to still require
>> >> replication, but that sounds rather far fetched to be honest.
>> >
>> >
>> > Actually, that makes some sense. This cluster had an OSD down while
>> > (some
>> > of) the deletes were happening?
>>
>> I thought of exactly that too, but no it didn't. That's the problem.
>
>
> Okay, in that case I've no idea. What was the timeline for the recovery
> versus the rados bench and cleanup versus the degraded object counts, then?

1. Jewel deployment with filestore.
2. Upgrade to Luminous (including mgr deployment and "ceph osd
require-osd-release luminous"), still on filestore.
3. rados bench with subsequent cleanup.
4. All OSDs up, all  PGs active+clean.
5. Stop one OSD. Remove from CRUSH, auth list, OSD map.
6. Reinitialize OSD with bluestore.
7. Start OSD, commencing backfill.
8. Degraded objects above 100%.

Please let me know if that information is useful. Thank you!

Cheers,
Florian
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com