Slow performance during recovery operations

"Stillwell, Bryan" <bryan.stillwell@xxxxxxxxxxx> · Thu, 2 Apr 2015 13:31:49 -0400

All,

Whenever we're doing some kind of recovery operation on our ceph
clusters (cluster expansion or dealing with a drive failure), there
seems to be a fairly noticable performance drop while it does the
backfills (last time I measured it the performance during recovery was
something like 20% of a healthy cluster).  I'm wondering if there are
any settings that we might be missing which would improve this
situation?

Before doing any kind of expansion operation I make sure both 'noscrub'
and 'nodeep-scrub' are set to make sure scrubing isn't making things
worse.

Also we have the following options set in our ceph.conf:

[osd]
osd_journal_size = 16384
osd_max_backfills = 1
osd_recovery_max_active = 1
osd_recovery_op_priority = 1
osd_recovery_max_single_start = 1
osd_op_threads = 12
osd_crush_initial_weight = 0

I'm wondering if there might be a way to use ionice in the CFQ scheduler
to delegate the recovery traffic to be of the Idle type so customer
traffic has a higher priority?

Thanks,
Bryan

This E-mail and any of its attachments may contain Time Warner Cable proprietary information, which is privileged, confidential, or subject to copyright belonging to Time Warner Cable. This E-mail is intended solely
 for the use of the individual or entity to which it is addressed. If you are not the intended recipient of this E-mail, you are hereby notified that any dissemination, distribution, copying, or action taken in relation to the contents of and attachments to
 this E-mail is strictly prohibited and may be unlawful. If you have received this E-mail in error, please notify the sender immediately and permanently delete the original and any copy of this E-mail and any printout.

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com