Re: crush multipick anomaly

Pedro López-Adeva <plopezadeva@xxxxxxxxx> · Thu, 9 Mar 2017 09:47:44 +0100

Great, thanks for the clarifications.
I also think that the most natural way is to keep just a set of
weights in the CRUSH map and update them inside the algorithm.

I keep working on it.

2017-03-08 0:06 GMT+01:00 Sage Weil <sage@xxxxxxxxxxxx>:
> Hi Pedro,
>
> Thanks for taking a look at this!  It's a frustrating problem and we
> haven't made much headway.
>
> On Thu, 2 Mar 2017, Pedro López-Adeva wrote:
>> Hi,
>>
>> I will have a look. BTW, I have not progressed that much but I have
>> been thinking about it. In order to adapt the previous algorithm in
>> the python notebook I need to substitute the iteration over all
>> possible devices permutations to iteration over all the possible
>> selections that crush would make. That is the main thing I need to
>> work on.
>>
>> The other thing is of course that weights change for each replica.
>> That is, they cannot be really fixed in the crush map. So the
>> algorithm inside libcrush, not only the weights in the map, need to be
>> changed. The weights in the crush map should reflect then, maybe, the
>> desired usage frequencies. Or maybe each replica should have their own
>> crush map, but then the information about the previous selection
>> should be passed to the next replica placement run so it avoids
>> selecting the same one again.
>
> My suspicion is that the best solution here (whatever that means!)
> leaves the CRUSH weights intact with the desired distribution, and
> then generates a set of derivative weights--probably one set for each
> round/replica/rank.
>
> One nice property of this is that once the support is added to encode
> multiple sets of weights, the algorithm used to generate them is free to
> change and evolve independently.  (In most cases any change is
> CRUSH's mapping behavior is difficult to roll out because all
> parties participating in the cluster have to support any new behavior
> before it is enabled or used.)
>
>> I have a question also. Is there any significant difference between
>> the device selection algorithm description in the paper and its final
>> implementation?
>
> The main difference is the "retry_bucket" behavior was found to be a bad
> idea; any collision or failed()/overload() case triggers the
> retry_descent.
>
> There are other changes, of course, but I don't think they'll impact any
> solution we come with here (or at least any solution can be suitably
> adapted)!
>
> sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html