Re: objects degraded higher than 100%

Simon Ironside <sironside@xxxxxxxxxxxxx> · Wed, 6 Mar 2019 14:36:41 +0000

Hi,

I'm still seeing this issue during failure testing of a new Mimic 13.2.4 
cluster. To reproduce:

- Working Mimic 13.2.4 cluster
- Pull a disk
- Wait for recovery to complete (i.e. back to HEALTH_OK)
- Remove the OSD with `ceph osd crush remove`
- See greater than 100% degraded objects while it recovers as below

It doesn't seem to do any harm, once recovery completes the cluster 
returns to HEALTH_OK.
I can only find bug 21803 on the tracker that seems to cover this 
behaviour which is marked as resolved.

Simon

  cluster:
    id:     MY ID
    health: HEALTH_WARN
            709/58572 objects misplaced (1.210%)
            Degraded data redundancy: 90094/58572 objects degraded 
(153.818%), 49 pgs degraded, 51 pgs undersized

  services:
    mon: 3 daemons, quorum san2-mon1,san2-mon2,san2-mon3
    mgr: san2-mon1(active), standbys: san2-mon2, san2-mon3
    osd: 52 osds: 52 up, 52 in; 84 remapped pgs

  data:
    pools:   16 pools, 2016 pgs
    objects: 19.52 k objects, 72 GiB
    usage:   7.8 TiB used, 473 TiB / 481 TiB avail
    pgs:     90094/58572 objects degraded (153.818%)
             709/58572 objects misplaced (1.210%)
             1932 active+clean
             47   active+recovery_wait+undersized+degraded+remapped
             33   active+remapped+backfill_wait
             2    active+recovering+undersized+remapped
             1    active+recovery_wait+undersized+degraded
             1    active+recovering+undersized+degraded+remapped

  io:
    client:   24 KiB/s wr, 0 op/s rd, 3 op/s wr
    recovery: 0 B/s, 126 objects/s

On 13/10/2017 18:53, David Zafman wrote:

I improved the code to compute degraded objects during 
backfill/recovery.  During my testing it wouldn't result in percentage 
above 100%.  I'll have to look at the code and verify that some 
subsequent changes didn't break things.

David

On 10/13/17 9:55 AM, Florian Haas wrote:
Okay, in that case I've no idea. What was the timeline for the 
recovery
versus the rados bench and cleanup versus the degraded object counts,
then?
1. Jewel deployment with filestore.
2. Upgrade to Luminous (including mgr deployment and "ceph osd
require-osd-release luminous"), still on filestore.
3. rados bench with subsequent cleanup.
4. All OSDs up, all  PGs active+clean.
5. Stop one OSD. Remove from CRUSH, auth list, OSD map.
6. Reinitialize OSD with bluestore.
7. Start OSD, commencing backfill.
8. Degraded objects above 100%.

Please let me know if that information is useful. Thank you!

Hmm, that does leave me a little perplexed.
Yeah exactly, me too. :)

David, do we maybe do something with degraded counts based on the 
number of
objects identified in pg logs? Or some other heuristic for number of 
objects
that might be stale? That's the only way I can think of to get these 
weird
returning sets.
One thing that just crossed my mind: would it make a difference
whether after the OSD goes out or not, in the time window between it
going down and being deleted from the crushmap/osdmap? I think it
shouldn't (whether being marked out or just non-existent, it's not
eligible for holding any data so either way), but I'm not really sure
about the mechanics of the internals here.

Cheers,
Florian

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com