Re: recovery scheduling

Josh Durgin <jdurgin@xxxxxxxxxx> · Wed, 25 Oct 2017 15:40:15 -0700

On 10/25/2017 03:26 PM, Sage Weil wrote:
As I'm watching the mgr balancer module optimizing the layout on the lab
cluster I'm seeing a lot of cases where the recovery scheduling in
luminous is broken.  For example,

     pgs:     63895/153086292 objects degraded (0.042%)
              1439665/153086292 objects misplaced (0.940%)
              12166 active+clean
              198   active+remapped+backfill_wait
              97    active+remapped+backfilling
              10    active+undersized+degraded+remapped+backfill_wait
              6     active+recovery_wait+degraded
              2     active+recovery_wait+degraded+remapped
              1     active+undersized+degraded+remapped+backfilling

It is "wrong" that any PGs would be in recovery_wait (a high priority
log-based recovery activity) when there is a ton of backfill going on.
I've fixed this in master with a few rounds of recovery preemption PRs,
and shaken out a few other issues in the process, but can't backport it to
luminous without burning a feature bit.

I just took an inventory and we have 12 bits available, and I just marked
8 more deprecated that we can remove after the O release.

So... I think it's worth burning one on this.  Any objections?

That seems reasonable. I don't have any further backports needing a
feature bit in mind currently.

Josh
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html