On 03/28/2017 01:16 AM, Sage Weil wrote: > On Mon, 27 Mar 2017, Loic Dachary wrote: >> Hi Sage, >> >> I added a proposal[1] to the next CDM[2] to be able to store and use >> multiple weights for each item in the crushmap. >> >> I'm not sure about the naming (probabilities is a long word...) but it >> would be good to not have a third variable named weight with yet another >> meaning. Although we could modify the builder functions, I'm not sure >> this is a good idea. They are not complete enough to hide the crush data >> structures and it is probably enough to document the new data members. >> And the caller can allocate it and fill it after creating a bucket. In >> practice that is going to be used by CrushWrapper or python-crush which >> will define an API. >> >> It is probably best if the crush modification is the bare minimum: >> >> - allowing the choose function to pick a probability from a table >> indexed by the round instead of relying on a single weight >> - storing the probability table in each item >> >> Feel free to modify the pad if you have other ideas, this is a first >> draft done today and I won't be offended if it is discarded ;-) > > What about a new concept called a "weight set" that represents an > alternative set of weights for the entire map (or perhaps a subset of it, > like a particular hierarchy of interest). We allow do_rule to be called > with no weight_set specified, in which case it uses the > legacy/default/canonical weights. Or if an alternative weight set is > specified, those weights are used instead. > > I think we have a couple interesting options for the representation: > > 1) It could be as simple as > > vector<int> bucket_weights, item_weights; > > which would lose the current (mis?)feature where the weigth for an item > may be different in different buckets. Or, > > 2) it could mirror the bucket item_weights (let's call it vector<int> > item_weights even though it's really a C array). Something like > > vector<int,vector<int>> bucket_item_weights; > > where it's indexed first by bucket id, then the position in the bucket. > > I guess #2 is most general and mirrors what we have. That would give us > something like > > struct weight_set { > vector<int,vector<int>> bucket_item_weights; > }; > > for a simple replacement weight. But we actually want > per-position weights, so instead we can do > > struct weight_set { > vector<int,vector<vector<int>>> bucket_item_position_weights; > }; > > where it's bucket id, item position, current position. In reality, > for each bucket, we probably want a flat 2-d array, something like > > struct weight_set { > struct bucket_weights { > unt32_t num_positions, num_items; > uint32_t *data; > }; > vector<int,bucket_weights> bucket_weights; > }; > > or whatever. > > The last thing is that the number of per-position weights we need is > a property of the crush rule we're using, not the bucket itself. So we > should probably calculate it out for however many we think we need, and if > we are doing a placement past that point, keep using the last weight. > That means the degenerate case of just 1 position means we'll use it for > all positions, which is what we'd want in the simple case anyway. I like this. The crush_do_rule could have an additional argument, a pointer to the weight set. If NULL it falls back to what we currently have. It is then up to the caller to build such a weight set. It can be addressing only part of the bucket tree or all of it, with num_positions matching the outsize. It is better seen as an argument to crush_do_rule rather than an integral part of the crushmap. Does that sound sensible ? > > The last thing I think we should do while we're making this change is > also let you specify an alternative id for the bucket when doing the hash. > This is currently an annoying property of crush where if the bucket id > changes all data shuffles. But we could add an int32_t id to the > bucket_weights above so that you could have a new weight set using any id > you want.. that'll be useful for lots of stuff I suspect. Ok. Cheers > > sage > > > > >> >> Cheers >> >> [1] http://pad.ceph.com/p/crush-multiweight >> [2] http://tracker.ceph.com/projects/ceph/wiki/CDM_05-APR-2017 >> >> -- >> Loïc Dachary, Artisan Logiciel Libre >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majordomo@xxxxxxxxxxxxxxx >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> -- Loïc Dachary, Artisan Logiciel Libre -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html