On 11/28/16 10:02, Kevin Olbrich wrote:
I think the general statement is that if your cluster is very small, you might want to do it more gradually. And if it's large, a whole node at once might not cause any noticeable effect. And doing it all at once will cause less overall data movement... move data once to its final place. But either way, make sure you know about settings such as: ceph osd set noscrub ceph osd set nodeep-scrub # 1 makes recovery slow...raise it to where you can still tolerate the load osd max backfills = 1 osd recovery max active = 1 osd recovery op priority = 1 osd recover max single start = 1 osd op threads = 12 # you probably have this already (default) osd client op priority = 63 And if that's not enough, I found this one that worked better than the rest (and for my 27 osd 3 node cluster, 0.6 here and 2 max active was tolerable and faster than 1 max active and no setting here): osd recovery sleep = 0.6 and when you want to give it a rest due to some issues: ceph osd set nobackfill ceph osd set norecover To use the config options at runtime, you can use a command like: ceph tell osd.* injectargs --osd-max-backfills=1 And I'm sure I missed some options and someone can mention them too. |
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com