Re: recovery scheduling

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 10/25/2017 03:26 PM, Sage Weil wrote:
As I'm watching the mgr balancer module optimizing the layout on the lab
cluster I'm seeing a lot of cases where the recovery scheduling in
luminous is broken.  For example,

     pgs:     63895/153086292 objects degraded (0.042%)
              1439665/153086292 objects misplaced (0.940%)
              12166 active+clean
              198   active+remapped+backfill_wait
              97    active+remapped+backfilling
              10    active+undersized+degraded+remapped+backfill_wait
              6     active+recovery_wait+degraded
              2     active+recovery_wait+degraded+remapped
              1     active+undersized+degraded+remapped+backfilling

It is "wrong" that any PGs would be in recovery_wait (a high priority
log-based recovery activity) when there is a ton of backfill going on.
I've fixed this in master with a few rounds of recovery preemption PRs,
and shaken out a few other issues in the process, but can't backport it to
luminous without burning a feature bit.

I just took an inventory and we have 12 bits available, and I just marked
8 more deprecated that we can remove after the O release.

So... I think it's worth burning one on this.  Any objections?

That seems reasonable. I don't have any further backports needing a
feature bit in mind currently.

Josh
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux