Prioritize recovery over backfilling

Caspar Smit <casparsmit@xxxxxxxxxxx> · Wed, 20 Dec 2017 16:54:06 +0100

Hi all,
I've been lowering some weights (0.95) of a few OSD's in our cluster.

I have max backfills at the default of 1 so every OSD can only do backfilll of 1 pg at a time.

So sometimes the status of a the rebalanced pg's change to only

             10 active+remapped+backfilling

But sometimes to 

              8 active+remapped+backfilling
              2 active+remapped+backfill_wait

I get that this is because some pg are on the same OSD that is already backfilling.

But during the backfilling in the case there are backfill_wait pgs i see (only a few) objects getting degraded and this is increasing the longer the backfilling takes until the backfill_wait pgs are actually backfilling.

How are these objects getting degraded? Is it not possible for ceph to write new objects to the backfill_wait pgs or does this have another cause?

The pgs state doesn't change to active+degraded or something so the pg is not degraded but some objects are? 

Furthermore i noticed that when having a replicated and an erasure coded pool and doing some reweights i get this state for instance:

             8    active+remapped+backfilling
             4    active+remapped+backfill_wait
             2    active+recovery_wait+degraded

The recovery_wait pgs belong to the erasure coded pool.

Shouldn't ceph prioritize the recovery of any degraded pgs before doing any backfilling of remapped pgs (there are 3 copies just not in the right place)?

Is there any mechanism to change priority for recovery over backfilling?

And erasure coded pgs cannot be in remapped state? So when reweighting you always end up with degraded pgs?

Sorry for the many questions but i want to comprehend what is actually happening here.

Ceph version is 12.2.1

Kind regards,
Caspar

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com