Re: mgr balancer module

Sage Weil <sweil@xxxxxxxxxx> · Thu, 3 Aug 2017 15:53:55 +0000 (UTC)

Hi Spandan,

On Thu, 3 Aug 2017, Spandan Kumar Sahu wrote:
> Sage
> 
> I think it would be a good idea to include a command in the balancer
> module itself, that would optimize the crushmap using the
> python-crush, and set the optimized crushmap.
> 
> As far as I believe, uneven distributions can be majorly attributed to
> the factors:
> * using an unoptimized crushmap
> * unevenness that occurs due to the (pseudo) random nature of CRUSH
> * objects having different sizes.
> 
> If we set an optimized crushmap, at the very initial stages, we have
> to move very less data in the due course, in order to maintain a
> proper distribution. Hence the necessity of including it in the
> balancer module. Please give a look at the PR[1], I sent in this
> regard, and let me know if I am moving in the right direction.

There are a few problems with using python-crush, the main one being that 
the dependencies are problematic: it's built from a forked repo and is not 
packaged properly (has to be installed with pip).  It also may not 
match the CRUSH version being used by the cluster.

The larger issue though is that it doesn't address all of the other 
problems I highlighted in my earlier email.  The main thing it *does* to 
properly is it does the optimization based on a model; this was the main 
problem with the old reweight-by-utilization.  The new framework in 
balancer.py has all the pieces now to let you do that.

I think the main value in the python-crush optimize code is that it 
demonstrably works, which means we know that the cost/score fuction being 
used and the descent method work together.  I think the best path forward 
is to look at the core of what those two pieces are doing and port it into 
the balancer environment.  Most recently I've been working on the 
'eval' method that will generate a score for a given distribution, but I'm 
working from first principles (just calculating the layout, its deviation 
from the target, the standard deviation, etc.) but I'm not sure what 
Loic's optimizer was doing.  Also, my first attempt at a descent function 
to correct weights was pretty broken, and I know a lot of experimentation 
went into Loic's method.

Do you see any problems with that approach, or things that the 
balancer framework does not cover?

Thanks!
sage

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html