Re: objects degraded higher than 100%

Darius Kasparavičius <daznis@xxxxxxxxx> · Wed, 6 Mar 2019 18:04:19 +0200

For some reason I didn't notice that number. 

But it's most likely you are hitting this or similar bug: https://tracker.ceph.com/issues/21803 

On Wed, Mar 6, 2019, 17:30 Simon Ironside <sironside@xxxxxxxxxxxxx> wrote:
That's the misplaced objects, no problem there. Degraded objects are at 

153.818%.

Simon

On 06/03/2019 15:26, Darius Kasparavičius wrote:

> Hi,

>

> there it's 1.2% not 1200%.

>

> On Wed, Mar 6, 2019 at 4:36 PM Simon Ironside <sironside@xxxxxxxxxxxxx> wrote:

>> Hi,

>>

>> I'm still seeing this issue during failure testing of a new Mimic 13.2.4

>> cluster. To reproduce:

>>

>> - Working Mimic 13.2.4 cluster

>> - Pull a disk

>> - Wait for recovery to complete (i.e. back to HEALTH_OK)

>> - Remove the OSD with `ceph osd crush remove`

>> - See greater than 100% degraded objects while it recovers as below

>>

>> It doesn't seem to do any harm, once recovery completes the cluster

>> returns to HEALTH_OK.

>> I can only find bug 21803 on the tracker that seems to cover this

>> behaviour which is marked as resolved.

>>

>> Simon

>>

>>     cluster:

>>       id:     MY ID

>>       health: HEALTH_WARN

>>               709/58572 objects misplaced (1.210%)

>>               Degraded data redundancy: 90094/58572 objects degraded

>> (153.818%), 49 pgs degraded, 51 pgs undersized

>>

>>     services:

>>       mon: 3 daemons, quorum san2-mon1,san2-mon2,san2-mon3

>>       mgr: san2-mon1(active), standbys: san2-mon2, san2-mon3

>>       osd: 52 osds: 52 up, 52 in; 84 remapped pgs

>>

>>     data:

>>       pools:   16 pools, 2016 pgs

>>       objects: 19.52 k objects, 72 GiB

>>       usage:   7.8 TiB used, 473 TiB / 481 TiB avail

>>       pgs:     90094/58572 objects degraded (153.818%)

>>                709/58572 objects misplaced (1.210%)

>>                1932 active+clean

>>                47   active+recovery_wait+undersized+degraded+remapped

>>                33   active+remapped+backfill_wait

>>                2    active+recovering+undersized+remapped

>>                1    active+recovery_wait+undersized+degraded

>>                1    active+recovering+undersized+degraded+remapped

>>

>>     io:

>>       client:   24 KiB/s wr, 0 op/s rd, 3 op/s wr

>>       recovery: 0 B/s, 126 objects/s

>>

>>

>> On 13/10/2017 18:53, David Zafman wrote:

>>> I improved the code to compute degraded objects during

>>> backfill/recovery.  During my testing it wouldn't result in percentage

>>> above 100%.  I'll have to look at the code and verify that some

>>> subsequent changes didn't break things.

>>>

>>> David

>>>

>>>

>>> On 10/13/17 9:55 AM, Florian Haas wrote:

>>>>>>> Okay, in that case I've no idea. What was the timeline for the

>>>>>>> recovery

>>>>>>> versus the rados bench and cleanup versus the degraded object counts,

>>>>>>> then?

>>>>>> 1. Jewel deployment with filestore.

>>>>>> 2. Upgrade to Luminous (including mgr deployment and "ceph osd

>>>>>> require-osd-release luminous"), still on filestore.

>>>>>> 3. rados bench with subsequent cleanup.

>>>>>> 4. All OSDs up, all  PGs active+clean.

>>>>>> 5. Stop one OSD. Remove from CRUSH, auth list, OSD map.

>>>>>> 6. Reinitialize OSD with bluestore.

>>>>>> 7. Start OSD, commencing backfill.

>>>>>> 8. Degraded objects above 100%.

>>>>>>

>>>>>> Please let me know if that information is useful. Thank you!

>>>>> Hmm, that does leave me a little perplexed.

>>>> Yeah exactly, me too. :)

>>>>

>>>>> David, do we maybe do something with degraded counts based on the

>>>>> number of

>>>>> objects identified in pg logs? Or some other heuristic for number of

>>>>> objects

>>>>> that might be stale? That's the only way I can think of to get these

>>>>> weird

>>>>> returning sets.

>>>> One thing that just crossed my mind: would it make a difference

>>>> whether after the OSD goes out or not, in the time window between it

>>>> going down and being deleted from the crushmap/osdmap? I think it

>>>> shouldn't (whether being marked out or just non-existent, it's not

>>>> eligible for holding any data so either way), but I'm not really sure

>>>> about the mechanics of the internals here.

>>>>

>>>> Cheers,

>>>> Florian

>>> _______________________________________________

>>> ceph-users mailing list

>>> ceph-users@xxxxxxxxxxxxxx

>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>> _______________________________________________

>> ceph-users mailing list

>> ceph-users@xxxxxxxxxxxxxx

>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com