Re: storing multiple weights in crush

Sage Weil <sweil@xxxxxxxxxx> · Mon, 27 Mar 2017 23:16:39 +0000 (UTC)

On Mon, 27 Mar 2017, Loic Dachary wrote:
> Hi Sage,
> 
> I added a proposal[1] to the next CDM[2] to be able to store and use 
> multiple weights for each item in the crushmap.
> 
> I'm not sure about the naming (probabilities is a long word...) but it 
> would be good to not have a third variable named weight with yet another 
> meaning. Although we could modify the builder functions, I'm not sure 
> this is a good idea. They are not complete enough to hide the crush data 
> structures and it is probably enough to document the new data members. 
> And the caller can allocate it and fill it after creating a bucket. In 
> practice that is going to be used by CrushWrapper or python-crush which 
> will define an API.
> 
> It is probably best if the crush modification is the bare minimum:
> 
> - allowing the choose function to pick a probability from a table 
> indexed by the round instead of relying on a single weight
> - storing the probability table in each item
> 
> Feel free to modify the pad if you have other ideas, this is a first 
> draft done today and I won't be offended if it is discarded ;-)

What about a new concept called a "weight set" that represents an 
alternative set of weights for the entire map (or perhaps a subset of it, 
like a particular hierarchy of interest).  We allow do_rule to be called 
with no weight_set specified, in which case it uses the 
legacy/default/canonical weights.  Or if an alternative weight set is 
specified, those weights are used instead.

I think we have a couple interesting options for the representation:

1) It could be as simple as

      vector<int> bucket_weights, item_weights;

which would lose the current (mis?)feature where the weigth for an item 
may be different in different buckets.  Or,

2) it could mirror the bucket item_weights (let's call it vector<int> 
item_weights even though it's really a C array).  Something like

      vector<int,vector<int>> bucket_item_weights;

where it's indexed first by bucket id, then the position in the bucket.

I guess #2 is most general and mirrors what we have.  That would give us 
something like

struct weight_set {
  vector<int,vector<int>> bucket_item_weights;
};

for a simple replacement weight. But we actually want 
per-position weights, so instead we can do

struct weight_set {
  vector<int,vector<vector<int>>> bucket_item_position_weights;
};

where it's bucket id, item position, current position.  In reality, 
for each bucket, we probably want a flat 2-d array, something like

struct weight_set {
  struct bucket_weights {
    unt32_t num_positions, num_items;
    uint32_t *data;
  };
  vector<int,bucket_weights> bucket_weights;
};

or whatever.

The last thing is that the number of per-position weights we need is 
a property of the crush rule we're using, not the bucket itself.  So we 
should probably calculate it out for however many we think we need, and if 
we are doing a placement past that point, keep using the last weight.  
That means the degenerate case of just 1 position means we'll use it for 
all positions, which is what we'd want in the simple case anyway.

The last thing I think we should do while we're making this change is 
also let you specify an alternative id for the bucket when doing the hash.  
This is currently an annoying property of crush where if the bucket id 
changes all data shuffles.  But we could add an int32_t id to the 
bucket_weights above so that you could have a new weight set using any id 
you want.. that'll be useful for lots of stuff I suspect.

sage

> 
> Cheers
> 
> [1] http://pad.ceph.com/p/crush-multiweight
> [2] http://tracker.ceph.com/projects/ceph/wiki/CDM_05-APR-2017
> 
> -- 
> Loïc Dachary, Artisan Logiciel Libre
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
>