Re: mgr balancer module

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 4 Aug 2017, Spandan Kumar Sahu wrote:
> On Thu, Aug 3, 2017 at 9:23 PM, Sage Weil <sweil@xxxxxxxxxx> wrote:
> > I think the main value in the python-crush optimize code is that it
> > demonstrably works, which means we know that the cost/score fuction being
> > used and the descent method work together.  I think the best path forward
> > is to look at the core of what those two pieces are doing and port it into
> > the balancer environment.  Most recently I've been working on the
> > 'eval' method that will generate a score for a given distribution, but I'm
> > working from first principles (just calculating the layout, its deviation
> > from the target, the standard deviation, etc.) but I'm not sure what
> 
> I had done some-work regarding assigning a score to the distribution
> at this[1] PR. It was however done in the pre-existing
> reweight-by-utilization. Would you give a look over it and let me
> know, if I should proceed to port it into the balancer module?

This seems reasonable...  I'm not sure we can really tell what the best 
function is without trying it in combination with some optimization 
method, though.

I just pushed a semi-complete/working eval function in the wip-balancer 
branch that uses a normalized standard deviation for pgs, objects, and 
bytes.  (Normalized meaning the standard deviation is divided by the 
total count of pgs or objects or whatever so that it is unitless.)  The 
final score is just the average of those three values.  Pretty sure that's 
not the most sensible thing but its a start.  FWIW I can do

 bin/init-ceph stop
 MON=1 OSD=8 MDS=0 ../src/vstart.sh -d -n -x -l 
 bin/ceph osd pool create foo 64 
 bin/ceph osd set-require-min-compat-client luminous
 bin/ceph balancer mode upmap
 bin/rados -p foo bench 10 write -b 4096 --no-cleanup
 bin/ceph balancer eval
 bin/ceph balancer optimize foo
 bin/ceph balancer eval foo
 bin/ceph balancer execute foo
 bin/ceph balancer eval 

and the score goes from .02 to .001 (and pgs get balanced).

> > Loic's optimizer was doing.  Also, my first attempt at a descent function
> > to correct weights was pretty broken, and I know a lot of experimentation
> > went into Loic's method.
> >
> 
> Loic's optimizer only fixed defects in the crushmap, and was not (in
> the true sense) a reweight-by-utilization.
> In short, Loic's optimizer was optimizing a pool, on the basis of a
> rule, and then, ran a simulation to determine the new weights. Using
> the `take` in rules, it used to determine a list of OSDs, and move
> weights (about 1% of the overload%) from one OSD to another. This way,
> the weights of the buckets on the next hierarchical level in
> crush-tree wasn't affected. I went through the Loic's optimizer in
> details and also added my own improvisations.
> 
> I will try to port the logic, but I am not sure, where would I fit the
> optimizer in? Would that go in as a separate function in module.py or
> would it have different implementations for each of upmaps, crush,
> crush-compat? Loic's python-crush didn't take upmaps into account. But
> the logic will apply in case of upmaps too.

The 'crush-compat' mode in balancer is the one to target.  There is a 
partial implementation there that needs to be updated to use the new 
framework; I'll fiddle with it a bit more to make it use the new Plan 
approach (currently it makes changes to the changes to the cluster, 
which doesn't work well!).  For now the latest is at

	https://github.com/ceph/ceph/pull/16272

You can ignore the other modes (upmap etc) for now.  Eventually we could 
make it so that transitioning from one mode to another will somehow phase 
out the old changes, but that's complicated and not needed yet.

> > Do you see any problems with that approach, or things that the
> > balancer framework does not cover?
> >
> 
> I was hoping that we have an optimizer that fixes the faults in
> crushmap, whenever, a crushmap is set and/or a new device get added or
> deleted. The current balancer would also fix it, but it would take
> much more time, and much more movement of data to achieve better
> distribution, compared to if we had fixed the crushmap itself, in the
> very beginning. Nevertheless, the balancer module, will eventually
> reach a reasonably good distribution.
> 
> Correct me, if I am wrong. :)

No, I think you're right.  I don't expect that people will be importing 
crush maps that often, though... and if they do they are hopefully clever 
enough to do their own thing.  The goal is for everything to be manageable 
via the CLI or (better yet) simply handled automatically by the system.

I think the main thing to worry about is the specific cases that people 
are likely to encounter (and tend ot complain about), like adding new 
devices and wanting the system to weight them in gradually.

sage



> 
> [1]: https://github.com/ceph/ceph/pull/16361/files#diff-ecab4c883be988760d61a8a883ddc23fR4559
> 
> > Thanks!
> > sage
> >
> 
> 
> 
> -- 
> Spandan Kumar Sahu
> IIT Kharagpur
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux