Re: mgr balancer module

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Aug 3, 2017 at 9:23 PM, Sage Weil <sweil@xxxxxxxxxx> wrote:
> Hi Spandan,
>
> On Thu, 3 Aug 2017, Spandan Kumar Sahu wrote:
>> Sage
>>
>> I think it would be a good idea to include a command in the balancer
>> module itself, that would optimize the crushmap using the
>> python-crush, and set the optimized crushmap.
>>
>> As far as I believe, uneven distributions can be majorly attributed to
>> the factors:
>> * using an unoptimized crushmap
>> * unevenness that occurs due to the (pseudo) random nature of CRUSH
>> * objects having different sizes.
>>
>> If we set an optimized crushmap, at the very initial stages, we have
>> to move very less data in the due course, in order to maintain a
>> proper distribution. Hence the necessity of including it in the
>> balancer module. Please give a look at the PR[1], I sent in this
>> regard, and let me know if I am moving in the right direction.
>
> There are a few problems with using python-crush, the main one being that
> the dependencies are problematic: it's built from a forked repo and is not
> packaged properly (has to be installed with pip).  It also may not
> match the CRUSH version being used by the cluster.
>
I tried adding it to the install-deps.sh, but I missed the fact that,
the python-crush, will not get an updated with changes in the crush
directory of the Ceph source.

> The larger issue though is that it doesn't address all of the other
> problems I highlighted in my earlier email.  The main thing it *does* to
> properly is it does the optimization based on a model; this was the main
> problem with the old reweight-by-utilization.  The new framework in
> balancer.py has all the pieces now to let you do that.
>

Okay, may be then, I will try to port only the logic Loic's work and
see how this works.

> I think the main value in the python-crush optimize code is that it
> demonstrably works, which means we know that the cost/score fuction being
> used and the descent method work together.  I think the best path forward
> is to look at the core of what those two pieces are doing and port it into
> the balancer environment.  Most recently I've been working on the
> 'eval' method that will generate a score for a given distribution, but I'm
> working from first principles (just calculating the layout, its deviation
> from the target, the standard deviation, etc.) but I'm not sure what

I had done some-work regarding assigning a score to the distribution
at this[1] PR. It was however done in the pre-existing
reweight-by-utilization. Would you give a look over it and let me
know, if I should proceed to port it into the balancer module?

> Loic's optimizer was doing.  Also, my first attempt at a descent function
> to correct weights was pretty broken, and I know a lot of experimentation
> went into Loic's method.
>

Loic's optimizer only fixed defects in the crushmap, and was not (in
the true sense) a reweight-by-utilization.
In short, Loic's optimizer was optimizing a pool, on the basis of a
rule, and then, ran a simulation to determine the new weights. Using
the `take` in rules, it used to determine a list of OSDs, and move
weights (about 1% of the overload%) from one OSD to another. This way,
the weights of the buckets on the next hierarchical level in
crush-tree wasn't affected. I went through the Loic's optimizer in
details and also added my own improvisations.

I will try to port the logic, but I am not sure, where would I fit the
optimizer in? Would that go in as a separate function in module.py or
would it have different implementations for each of upmaps, crush,
crush-compat? Loic's python-crush didn't take upmaps into account. But
the logic will apply in case of upmaps too.

> Do you see any problems with that approach, or things that the
> balancer framework does not cover?
>

I was hoping that we have an optimizer that fixes the faults in
crushmap, whenever, a crushmap is set and/or a new device get added or
deleted. The current balancer would also fix it, but it would take
much more time, and much more movement of data to achieve better
distribution, compared to if we had fixed the crushmap itself, in the
very beginning. Nevertheless, the balancer module, will eventually
reach a reasonably good distribution.

Correct me, if I am wrong. :)

[1]: https://github.com/ceph/ceph/pull/16361/files#diff-ecab4c883be988760d61a8a883ddc23fR4559

> Thanks!
> sage
>



-- 
Spandan Kumar Sahu
IIT Kharagpur
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux