ceph osd crush tunables optimal AND add new OSD at the same time

greg@xxxxxxxxxxx (Gregory Farnum) · Wed, 16 Jul 2014 16:52:24 -0700

On Wed, Jul 16, 2014 at 4:45 PM, Craig Lewis <clewis at centraldesktop.com> wrote:
> One of the things I've learned is that many small changes to the cluster are
> better than one large change.  Adding 20% more OSDs?  Don't add them all at
> once, trickle them in over time.  Increasing pg_num & pgp_num from 128 to
> 1024?  Go in steps, not one leap.
>
> I try to avoid operations that will touch more than 20% of the disks
> simultaneously.  When I had journals on HDD, I tried to avoid going over 10%
> of the disks.
>
>
> Is there a way to execute `ceph osd crush tunables optimal` in a way that
> takes smaller steps?

Unfortunately not; the crush tunables are changes to the core
placement algorithms at work.

If I blue-sky it I suppose we could do something like ship multiple
CRUSH configurations and the hash ranges within which you use each
one, and then incrementally move the dividing lines from 0 over to
1...but that'd be a big change to get protocol support (*none* of the
currently-deployed software could work with a cluster in such a
hypothetical state) and test everything with it, and is not on
anybody's roadmap. I wouldn't expect anybody to be changing CRUSH
tunables on a non-toy cluster; there's a reason we had the config
option for whether you get warnings or not. :)
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com