Re: crush multipick anomaly

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Pedro,

Thanks for taking a look at this!  It's a frustrating problem and we 
haven't made much headway.

On Thu, 2 Mar 2017, Pedro López-Adeva wrote:
> Hi,
> 
> I will have a look. BTW, I have not progressed that much but I have
> been thinking about it. In order to adapt the previous algorithm in
> the python notebook I need to substitute the iteration over all
> possible devices permutations to iteration over all the possible
> selections that crush would make. That is the main thing I need to
> work on.
> 
> The other thing is of course that weights change for each replica.
> That is, they cannot be really fixed in the crush map. So the
> algorithm inside libcrush, not only the weights in the map, need to be
> changed. The weights in the crush map should reflect then, maybe, the
> desired usage frequencies. Or maybe each replica should have their own
> crush map, but then the information about the previous selection
> should be passed to the next replica placement run so it avoids
> selecting the same one again.

My suspicion is that the best solution here (whatever that means!) 
leaves the CRUSH weights intact with the desired distribution, and 
then generates a set of derivative weights--probably one set for each 
round/replica/rank.

One nice property of this is that once the support is added to encode 
multiple sets of weights, the algorithm used to generate them is free to 
change and evolve independently.  (In most cases any change is 
CRUSH's mapping behavior is difficult to roll out because all 
parties participating in the cluster have to support any new behavior 
before it is enabled or used.)

> I have a question also. Is there any significant difference between
> the device selection algorithm description in the paper and its final
> implementation?

The main difference is the "retry_bucket" behavior was found to be a bad 
idea; any collision or failed()/overload() case triggers the 
retry_descent.

There are other changes, of course, but I don't think they'll impact any 
solution we come with here (or at least any solution can be suitably 
adapted)!

sage

[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux