Re: storing multiple weights in crush

Loic Dachary <loic@xxxxxxxxxxx> · Tue, 4 Apr 2017 12:10:17 +0200

Hi Sage,

I drafted an implementation[0] based on the idea that the weight set is just an argument to crush_do_rule and updated the pad[1] with information about it. Looking forward to discussing this further tomorrow during CDM[2] :-)

Cheers

[0] draft implementation http://libcrush.org/main/libcrush/merge_requests/30/diffs
[1] http://pad.ceph.com/p/crush-multiweight
[2] CDM http://tracker.ceph.com/projects/ceph/wiki/CDM_05-APR-2017

On 03/28/2017 12:27 PM, Loic Dachary wrote:
> 
> 
> On 03/28/2017 01:16 AM, Sage Weil wrote:
>> On Mon, 27 Mar 2017, Loic Dachary wrote:
>>> Hi Sage,
>>>
>>> I added a proposal[1] to the next CDM[2] to be able to store and use 
>>> multiple weights for each item in the crushmap.
>>>
>>> I'm not sure about the naming (probabilities is a long word...) but it 
>>> would be good to not have a third variable named weight with yet another 
>>> meaning. Although we could modify the builder functions, I'm not sure 
>>> this is a good idea. They are not complete enough to hide the crush data 
>>> structures and it is probably enough to document the new data members. 
>>> And the caller can allocate it and fill it after creating a bucket. In 
>>> practice that is going to be used by CrushWrapper or python-crush which 
>>> will define an API.
>>>
>>> It is probably best if the crush modification is the bare minimum:
>>>
>>> - allowing the choose function to pick a probability from a table 
>>> indexed by the round instead of relying on a single weight
>>> - storing the probability table in each item
>>>
>>> Feel free to modify the pad if you have other ideas, this is a first 
>>> draft done today and I won't be offended if it is discarded ;-)
>>
>> What about a new concept called a "weight set" that represents an 
>> alternative set of weights for the entire map (or perhaps a subset of it, 
>> like a particular hierarchy of interest).  We allow do_rule to be called 
>> with no weight_set specified, in which case it uses the 
>> legacy/default/canonical weights.  Or if an alternative weight set is 
>> specified, those weights are used instead.
>>
>> I think we have a couple interesting options for the representation:
>>
>> 1) It could be as simple as
>>
>>       vector<int> bucket_weights, item_weights;
>>
>> which would lose the current (mis?)feature where the weigth for an item 
>> may be different in different buckets.  Or,
>>
>> 2) it could mirror the bucket item_weights (let's call it vector<int> 
>> item_weights even though it's really a C array).  Something like
>>
>>       vector<int,vector<int>> bucket_item_weights;
>>
>> where it's indexed first by bucket id, then the position in the bucket.
>>
>> I guess #2 is most general and mirrors what we have.  That would give us 
>> something like
>>
>> struct weight_set {
>>   vector<int,vector<int>> bucket_item_weights;
>> };
>>
>> for a simple replacement weight. But we actually want 
>> per-position weights, so instead we can do
>>
>> struct weight_set {
>>   vector<int,vector<vector<int>>> bucket_item_position_weights;
>> };
>>
>> where it's bucket id, item position, current position.  In reality, 
>> for each bucket, we probably want a flat 2-d array, something like
>>
>> struct weight_set {
>>   struct bucket_weights {
>>     unt32_t num_positions, num_items;
>>     uint32_t *data;
>>   };
>>   vector<int,bucket_weights> bucket_weights;
>> };
>>
>> or whatever.
>>
>> The last thing is that the number of per-position weights we need is 
>> a property of the crush rule we're using, not the bucket itself.  So we 
>> should probably calculate it out for however many we think we need, and if 
>> we are doing a placement past that point, keep using the last weight.  
>> That means the degenerate case of just 1 position means we'll use it for 
>> all positions, which is what we'd want in the simple case anyway.
> 
> I like this. The crush_do_rule could have an additional argument, a pointer to the weight set. If NULL it falls back to what we currently have. It is then up to the caller to build such a weight set. It can be addressing only part of the bucket tree or all of it, with num_positions matching the outsize. It is better seen as an argument to crush_do_rule rather than an integral part of the crushmap.
> 
> Does that sound sensible ?
> 
>>
>> The last thing I think we should do while we're making this change is 
>> also let you specify an alternative id for the bucket when doing the hash.  
>> This is currently an annoying property of crush where if the bucket id 
>> changes all data shuffles.  But we could add an int32_t id to the 
>> bucket_weights above so that you could have a new weight set using any id 
>> you want.. that'll be useful for lots of stuff I suspect.
> 
> Ok.
> 
> Cheers
> 
>>
>> sage
>>
>>
>>
>>
>>>
>>> Cheers
>>>
>>> [1] http://pad.ceph.com/p/crush-multiweight
>>> [2] http://tracker.ceph.com/projects/ceph/wiki/CDM_05-APR-2017
>>>
>>> -- 
>>> Loïc Dachary, Artisan Logiciel Libre
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
> 

-- 
Loïc Dachary, Artisan Logiciel Libre
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html