Re: pausing "recovery" when adding new machine

"Michael J. Kidd" <michael.kidd@xxxxxxxxxxx> · Fri, 7 Mar 2014 19:19:06 -0500

It will attempt to fill them as fast as possible, however you can limit the refill rate / number of disks being recovered with the following:

# ceph osd tell \* injectargs '--osd_backfill_scan_min 16 --osd_backfill_scan_max 32 --osd_recovery_op_priority 1'

# ceph osd tell \* injectargs '--osd_recovery_max_active 1'

This will allow all PG placements to be calculated across all the newly added disks (when accompanied with my previous email), but only one disk be filled at a time, and at a low priority.  This will minimize performance impact overall.

For more details on these settings, along with their default values (so you can restore or adjust them as you like after the addition), please see:
http://ceph.com/docs/master/rados/configuration/osd-config-ref/#operations

Thanks,

Michael J. Kidd
Sr. Storage Consultant
Inktank Professional Services

On Fri, Mar 7, 2014 at 5:10 PM, John Kinsella <jlk@xxxxxxxxxxxx> wrote:

Does rebalancing across multiple new OSDs at once like that affect cluster performance more or less than one at a time?

On Mar 7, 2014, at 2:04 PM, Michael J. Kidd <michael.kidd@xxxxxxxxxxx> wrote:

> Hello Sid,

>   You may try setting the 'noup' flag (instead of the 'noout' flag).  This would prevent new OSDs from being set 'up' and therefore, the data rebalance shouldn't occur.  Once you add all OSDs, then unset the 'noup' flag and ensure they're set 'up' automatically... if not, use 'ceph osd up <osdid>' to bring them up manually.

>

> Hope this helps!

>

> Michael J. Kidd

> Sr. Storage Consultant

> Inktank Professional Services

>

>

> On Fri, Mar 7, 2014 at 3:06 PM, Sidharta Mukerjee <smukerjee99@xxxxxxxxx> wrote:

> When I use ceph-deploy to add a bunch of new OSDs (from a new machine), the ceph cluster starts rebalancing immediately; as a result, the first couple OSDs are started properly; but the last few can't start because I keep getting a "timeout problem", as shown here:

>

> [root@ia6 ia_scripts]# service ceph start osd.24

> === osd.24 ===

> failed: 'timeout 10 /usr/bin/ceph                     --name=osd.24                   --keyring=/var/lib/ceph/osd/ceph-24/keyring             osd crush create-or-move                        --                      24                      1.82                    root=default            host=ia6

>

> Is there a way I can pause the "recovery" so that the overall system behaves way faster and I can then start all the OSDs, make sure they're up and they look "normal" (via ceph osd tree) , and then unpause recovery?

>

> -Sid

>

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com