Re: Prioritize recovery over backfilling

Piotr Dałek <piotr.dalek@xxxxxxxxxxxx> · Thu, 7 Jun 2018 08:47:52 +0200

On 18-06-06 09:29 PM, Caspar Smit wrote:
Hi all,

We have a Luminous 12.2.2 cluster with 3 nodes and i recently added a node 
to it.

osd-max-backfills is at the default 1 so backfilling didn't go very fast but 
that doesn't matter.

Once it started backfilling everything looked ok:

~300 pgs in backfill_wait
~10 pgs backfilling (~number of new osd's)

But i noticed the degraded objects increasing a lot. I presume a pg that is 
in backfill_wait state doesn't accept any new writes anymore? Hence 
increasing the degraded objects?

So far so good, but once a while i noticed a random OSD flapping (they come 
back up automatically). This isn't because the disk is saturated but a 
driver/controller/kernel incompatibility which 'hangs' the disk for a short 
time (scsi abort_task error in syslog). Investigating further i noticed this 
was already the case before the node expansion.
These OSD's flapping results in lots of pg states which are a bit worrying:

              109 active+remapped+backfill_wait
              80  active+undersized+degraded+remapped+backfill_wait
              51  active+recovery_wait+degraded+remapped
              41  active+recovery_wait+degraded
              27  active+recovery_wait+undersized+degraded+remapped
              14  active+undersized+remapped+backfill_wait
              4   active+undersized+degraded+remapped+backfilling

I think the recovery_wait is more important then the backfill_wait, so i 
like to prioritize these because the recovery_wait was triggered by the 
flapping OSD's
>
furthermore the undersized ones should get absolute priority or is that 
already the case?

I was thinking about setting "nobackfill" to prioritize recovery instead of 
backfilling.
Would that help in this situation? Or am i making it even worse then?

ps. i tried increasing the heartbeat values for the OSD's to no avail, they 
still get flagged as down once in a while after a hiccup of the driver.

First of all, use "nodown" flag so osds won't be marked down automatically 
and unset it once everything backfills/recovers and settles for good -- note 
that there might be lingering osd down reports, so unsetting nodown might 
cause some of problematic osds to be instantly marked as down.

Second, since Luminous you can use "ceph pg force-recovery" to ask 
particular pgs to recover first, even if there are other pgs to backfill 
and/or recovery.

--
Piotr Dałek
piotr.dalek@xxxxxxxxxxxx
https://www.ovhcloud.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com