Re: Question about recovery vs backfill priorities

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Ok, so here we go: https://github.com/ceph/ceph/pull/12389

I included ceph_pool_recovery in backfill. I also noticed that priority could be negative
so I made sure this code will properly work in such cases.

I've been testing this locally and backfill for inactive pages recovery is much better now.

While testing I found another problem causing IO stalls, also confirmed on 10.2.3. I made the cluster having some unfound objects during recovery by bringing all copies down, after bringing one OSD up, some requests ended up in "waiting for missing objects". If I left the cluster to fully recover, this eventually went away but the io was blocked even though there was no inactive PG. When I manually restarted OSD having "waiting for missing objects" on different test run, fio (the one I used to do IO load on the cluster) unblocked so I guess
those requests could have been unblocked earlier.

Is this something we could also improve?


Thanks,
Bartek

On 12/05/2016 04:18 PM, Sage Weil wrote:
On Mon, 5 Dec 2016, Bartłomiej Święcki wrote:
Hi,

I made a quick draft of how new priority code could look like, please let me
know if that's a good direction:

https://github.com/ceph/ceph/compare/master...ovh:wip-rework-recovery-priorities

Haven't tested it yet though so no PR yet, will do it today.
That looks reasonable to me!

A side question: Is there any reason why pool_recovery_priority is not
adjusting backfill priority?
Maybe it would be beneficial to include it there too?
I'm guessing not... Sam?

sage



Regards,
Bartek


On 12/01/2016 11:10 PM, Sage Weil wrote:
Hi Bartek,

On Thu, 1 Dec 2016, Bartłomiej Święcki wrote:
We're currently being hit by an issue with cluster recovery. The cluster
size
has been significantly extended (~50% new OSDs) and started recovery.
During recovery there was a HW failure and we ended up with some PGS in
peered
state with size < min_size (inactive).
Those peered PGs are waiting for backfill but the cluster still prefers
recovery of recovery_wait PGs - in our case this could be even few hours
before all recovery is finished (we're speeding up recovery up to limits
to
get the downtime as short as possible). Those peered PGs are blocked
during
this time and the whole cluster just struggles to operate at a reasonable
level.

We're running hammer 0.94.6 there and from the code it looks like recovery
will always have higher priority (jewel seems similar).
Documentation only says that log-based recovery must finish before
backfills.
Is this requirement needed for data consistency or something else?
For a given single PG, it will do it's own log recovery (to bring acting
OSDs fully up to date) before starting backfill, but between PGs there's
no dependency.

Ideally we'd like it to be this order: undersized inactive (size <
min_size)
recovery_wait => undersized inactive (size < min_size) wait_backfill =>
degraded recovery_wait => degraded wait_backfill => remapped
wait_backfill.
Changing priority calculation doesn't seem to be that hard but would it
end up
with inconsistent data?
We could definitely improve this, yeah.  The prioritization code is based
around PG::get_{recovery,backfill}_priority(), and the
OSD_{RECOVERY,BACKFILL}_* #defines in PG.h.  It currently assumes (log)
recovery is always higher priority than backfill, but as you say we can do
better.  As long as everything maps into a 0..255 priority value it should
be fine.

Getting the undersized inactive bumped to the top should be a simple tweak
(just force a top-value priority in that case)...

sage

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux