Prioritize recovery over backfilling

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all,

We have a Luminous 12.2.2 cluster with 3 nodes and i recently added a node to it.

osd-max-backfills is at the default 1 so backfilling didn't go very fast but that doesn't matter.

Once it started backfilling everything looked ok:

~300 pgs in backfill_wait
~10 pgs backfilling (~number of new osd's)

But i noticed the degraded objects increasing a lot. I presume a pg that is in backfill_wait state doesn't accept any new writes anymore? Hence increasing the degraded objects?

So far so good, but once a while i noticed a random OSD flapping (they come back up automatically). This isn't because the disk is saturated but a driver/controller/kernel incompatibility which 'hangs' the disk for a short time (scsi abort_task error in syslog). Investigating further i noticed this was already the case before the node expansion.
 
These OSD's flapping results in lots of pg states which are a bit worrying:

             109 active+remapped+backfill_wait
             80  active+undersized+degraded+remapped+backfill_wait
             51  active+recovery_wait+degraded+remapped
             41  active+recovery_wait+degraded
             27  active+recovery_wait+undersized+degraded+remapped
             14  active+undersized+remapped+backfill_wait
             4   active+undersized+degraded+remapped+backfilling

I think the recovery_wait is more important then the backfill_wait, so i like to prioritize these because the recovery_wait was triggered by the flapping OSD's

furthermore the undersized ones should get absolute priority or is that already the case?

I was thinking about setting "nobackfill" to prioritize recovery instead of backfilling.
Would that help in this situation? Or am i making it even worse then?

ps. i tried increasing the heartbeat values for the OSD's to no avail, they still get flagged as down once in a while after a hiccup of the driver.

i've injected the following settings into all OSD's and MON's:

osd heartbeat interval  18 (default = 6)
osd heartbeat grace 60 (default = 20)
osd mon heartbeat interval 60 (default = 30)

Am i adjusting the right settings or are there any other settings to increase the heartbeat grace?

Do these settings need a restart of the daemons or is injecting sufficient?

ps2. the drives which are flapping are Seagate Enterprise Capacity 10TB SATA 7k2 disks with model number:  ST10000NM0086. Are these drives notorious for this behaviour? Anyone has experience with these drives in a CEPH environment?

Kind regards,
Caspar Smit

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux