WOW!!! Those are some awfully high backfilling
settings you have there. They are 100% the reason that your
customers think your system is down. You're telling each OSD to
be able to have 20 backfill operations running at the exact same
time. I bet if you were watching iostat -x 1 on one of your
nodes before you inject those settings and then after you inject
those settings, the disk usage will go from a decent amount of
40-70% and jump all the way up to 100% as soon as those settings
are injected.
When you are backfilling, you are copying data from one
drive to another. Each osd-max-backfill you set it to is
another file it tries to copy at the same time. These can be
receiving data (writing to the disk) or moving data off
(reading from the disk followed by a delete). So by having 20
backfills happening at a time, you are telling each disk to
allow 20 files to be written and/or read from it at the same
time. What happens to a disk when you are copying 20 large
files to it at a time? all of them move slower (a lot to do
with disk thrashing having 20 threads all reading and writing
to different parts of the disk).
What you want to find is the point where your disks are
usually around 80-90% utilized while backfilling, but not
consistently 100%. The easy way to do that is to increase
your osd-max-backfills by 1 or 2 at a time until you see it go
too high, and then back off. I don't know many people that go
above 5 max backfills in a production cluster on spinning
disks. Usually the ones that do, do it temporarily while they
know their cluster isn't being utilized by customers much.
Personally I have used osd-recover-threads ands
osd-recover-max-active, I've been able to tune my clusters
only using osd-max-backfills. The lower you leave these the
longer the backfill will take, but the less impact your
customers will notice. I've found 3 to be a generally safe
number if customer IO is your priority, 5 works well if your
customers can be ok with it being slow (but still usable)...
but all of this depends on your hardware and software
use-cases. Test it while watching your disk utilizations and
test your application while finding the right number for your
environment.
Good Luck :)