Re: Misplaced/Degraded objects priority

Janne Johansson <icepic.dz@xxxxxxxxx> · Wed, 24 Oct 2018 13:53:58 +0200

Den ons 24 okt. 2018 kl 13:09 skrev Florent B <florent@xxxxxxxxxxx>:
> On a Luminous cluster having some misplaced and degraded objects after
> outage :
>
> health: HEALTH_WARN
>             22100/2496241 objects misplaced (0.885%)
>             Degraded data redundancy: 964/2496241 objects degraded
> (0.039%), 3 p
> gs degraded
>
> I can that Ceph gives priority on replacing objects instead of repairing
> degraded ones.
>
> Number of misplaced objects is decreasing, while number of degraded
> objects does not decrease.
> Is it expected ?

I think it is. It can even increase.

My theory is that you have a certain PG (or many) that is misplaced
during outage,
the cluster runs on with the replicas of the PG taking reads and
writes during recovery.
As long as there only exist reads, the PG (and the % of objects it
holds) will only be
misplaced, and as the cluster slowly gets stuff back to where it
belongs (or making a
new copy in a new OSD) this will decrease the % misplaced.

This takes non-zero time, and if there are writes to the PG (or other
queueing PGs) while
the move is running, ceph will know that not only is this PG lacking
one or more replicas,
the data that was recently written is available in less-than-optimal numbers.

I guess a PG has some kind of timestamp saying "last write was at time
xyz", so when it
recovers, a stream job makes a new empty PG, does a copy of all data
upto zyx into it
and after that is done, checks to see if the original PG still is at
version xyz in which case
it just jumps into service directly, or if the PG is at version xyz+10
then it asks for the last
10 changes, and repeats the check again.

Since there is a queue which is limited to max_recovery or
max_backfills, the longer the repair
takes to complete, the bigger the chance to see degraded aswell as
misplaced, but as the
number of misplaced goes down close to zero, the degraded number will
shrink really fast.

-- 
May the most significant bit of your life be positive.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com