Re: strange backfill delay after outing one node

Janne Johansson <icepic.dz@xxxxxxxxx> · Wed, 14 Aug 2019 11:08:00 +0200

Den ons 14 aug. 2019 kl 09:49 skrev Simon Oosthoek <s.oosthoek@xxxxxxxxxxxxx>:
Hi all,

Yesterday I marked out all the osds on one node in our new cluster to

reconfigure them with WAL/DB on their NVMe devices, but it is taking

ages to rebalance.

> ceph tell 'osd.*' injectargs '--osd-max-backfills 16'

> ceph tell 'osd.*' injectargs '--osd-recovery-max-active 4'

Since the cluster is currently hardly loaded, backfilling can take up

all the unused bandwidth as far as I'm concerned...

Is it a good idea to give the above commands or other commands to speed

up the backfilling? (e.g. like increasing "osd max backfills")

OSD max backfills is going to have a very large effect on recovery time, so that
would be the obvious knob to twist first. Check what it defaults to now, raise to 4,8,12,16
in steps and see that it doesn't slow rebalancing down too much.
Spindrives without any ssd/nvme journal/wal/db should perhaps have 1 or 2 at most,
ssds can take more than that and nvme even more before diminishing gains occur.

-- 
May the most significant bit of your life be positive.

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com