Re: crush multipick anomaly

Loic Dachary <loic@xxxxxxxxxxx> · Sat, 18 Mar 2017 10:21:38 +0100

Hi Pedro,

I'm going to experiment with what you did at

https://github.com/plafl/notebooks/blob/master/replication.ipynb

and the latest python-crush published today. A comparison function was added that will help measure the data movement. I'm hoping we can release an offline tool based on your solution. Please let me know if I should wait before diving into this, in case you have unpublished drafts or new ideas.

Cheers

On 03/09/2017 09:47 AM, Pedro López-Adeva wrote:
> Great, thanks for the clarifications.
> I also think that the most natural way is to keep just a set of
> weights in the CRUSH map and update them inside the algorithm.
> 
> I keep working on it.
> 
> 
> 2017-03-08 0:06 GMT+01:00 Sage Weil <sage@xxxxxxxxxxxx>:
>> Hi Pedro,
>>
>> Thanks for taking a look at this!  It's a frustrating problem and we
>> haven't made much headway.
>>
>> On Thu, 2 Mar 2017, Pedro López-Adeva wrote:
>>> Hi,
>>>
>>> I will have a look. BTW, I have not progressed that much but I have
>>> been thinking about it. In order to adapt the previous algorithm in
>>> the python notebook I need to substitute the iteration over all
>>> possible devices permutations to iteration over all the possible
>>> selections that crush would make. That is the main thing I need to
>>> work on.
>>>
>>> The other thing is of course that weights change for each replica.
>>> That is, they cannot be really fixed in the crush map. So the
>>> algorithm inside libcrush, not only the weights in the map, need to be
>>> changed. The weights in the crush map should reflect then, maybe, the
>>> desired usage frequencies. Or maybe each replica should have their own
>>> crush map, but then the information about the previous selection
>>> should be passed to the next replica placement run so it avoids
>>> selecting the same one again.
>>
>> My suspicion is that the best solution here (whatever that means!)
>> leaves the CRUSH weights intact with the desired distribution, and
>> then generates a set of derivative weights--probably one set for each
>> round/replica/rank.
>>
>> One nice property of this is that once the support is added to encode
>> multiple sets of weights, the algorithm used to generate them is free to
>> change and evolve independently.  (In most cases any change is
>> CRUSH's mapping behavior is difficult to roll out because all
>> parties participating in the cluster have to support any new behavior
>> before it is enabled or used.)
>>
>>> I have a question also. Is there any significant difference between
>>> the device selection algorithm description in the paper and its final
>>> implementation?
>>
>> The main difference is the "retry_bucket" behavior was found to be a bad
>> idea; any collision or failed()/overload() case triggers the
>> retry_descent.
>>
>> There are other changes, of course, but I don't think they'll impact any
>> solution we come with here (or at least any solution can be suitably
>> adapted)!
>>
>> sage
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-- 
Loïc Dachary, Artisan Logiciel Libre
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html