recovery scheduling

Sage Weil <sweil@xxxxxxxxxx> · Wed, 25 Oct 2017 22:26:29 +0000 (UTC)

As I'm watching the mgr balancer module optimizing the layout on the lab 
cluster I'm seeing a lot of cases where the recovery scheduling in 
luminous is broken.  For example,

    pgs:     63895/153086292 objects degraded (0.042%)
             1439665/153086292 objects misplaced (0.940%)
             12166 active+clean
             198   active+remapped+backfill_wait
             97    active+remapped+backfilling
             10    active+undersized+degraded+remapped+backfill_wait
             6     active+recovery_wait+degraded
             2     active+recovery_wait+degraded+remapped
             1     active+undersized+degraded+remapped+backfilling

It is "wrong" that any PGs would be in recovery_wait (a high priority 
log-based recovery activity) when there is a ton of backfill going on.  
I've fixed this in master with a few rounds of recovery preemption PRs, 
and shaken out a few other issues in the process, but can't backport it to 
luminous without burning a feature bit.

I just took an inventory and we have 12 bits available, and I just marked 
8 more deprecated that we can remove after the O release.

So... I think it's worth burning one on this.  Any objections?

And I guess also, is there anything else we wish we could backport to 
luminous but would need to burn a feature bit to do it?

sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html