Hi Sage, I drafted an implementation[0] based on the idea that the weight set is just an argument to crush_do_rule and updated the pad[1] with information about it. Looking forward to discussing this further tomorrow during CDM[2] :-) Cheers [0] draft implementation http://libcrush.org/main/libcrush/merge_requests/30/diffs [1] http://pad.ceph.com/p/crush-multiweight [2] CDM http://tracker.ceph.com/projects/ceph/wiki/CDM_05-APR-2017 On 03/28/2017 12:27 PM, Loic Dachary wrote: > > > On 03/28/2017 01:16 AM, Sage Weil wrote: >> On Mon, 27 Mar 2017, Loic Dachary wrote: >>> Hi Sage, >>> >>> I added a proposal[1] to the next CDM[2] to be able to store and use >>> multiple weights for each item in the crushmap. >>> >>> I'm not sure about the naming (probabilities is a long word...) but it >>> would be good to not have a third variable named weight with yet another >>> meaning. Although we could modify the builder functions, I'm not sure >>> this is a good idea. They are not complete enough to hide the crush data >>> structures and it is probably enough to document the new data members. >>> And the caller can allocate it and fill it after creating a bucket. In >>> practice that is going to be used by CrushWrapper or python-crush which >>> will define an API. >>> >>> It is probably best if the crush modification is the bare minimum: >>> >>> - allowing the choose function to pick a probability from a table >>> indexed by the round instead of relying on a single weight >>> - storing the probability table in each item >>> >>> Feel free to modify the pad if you have other ideas, this is a first >>> draft done today and I won't be offended if it is discarded ;-) >> >> What about a new concept called a "weight set" that represents an >> alternative set of weights for the entire map (or perhaps a subset of it, >> like a particular hierarchy of interest). We allow do_rule to be called >> with no weight_set specified, in which case it uses the >> legacy/default/canonical weights. Or if an alternative weight set is >> specified, those weights are used instead. >> >> I think we have a couple interesting options for the representation: >> >> 1) It could be as simple as >> >> vector<int> bucket_weights, item_weights; >> >> which would lose the current (mis?)feature where the weigth for an item >> may be different in different buckets. Or, >> >> 2) it could mirror the bucket item_weights (let's call it vector<int> >> item_weights even though it's really a C array). Something like >> >> vector<int,vector<int>> bucket_item_weights; >> >> where it's indexed first by bucket id, then the position in the bucket. >> >> I guess #2 is most general and mirrors what we have. That would give us >> something like >> >> struct weight_set { >> vector<int,vector<int>> bucket_item_weights; >> }; >> >> for a simple replacement weight. But we actually want >> per-position weights, so instead we can do >> >> struct weight_set { >> vector<int,vector<vector<int>>> bucket_item_position_weights; >> }; >> >> where it's bucket id, item position, current position. In reality, >> for each bucket, we probably want a flat 2-d array, something like >> >> struct weight_set { >> struct bucket_weights { >> unt32_t num_positions, num_items; >> uint32_t *data; >> }; >> vector<int,bucket_weights> bucket_weights; >> }; >> >> or whatever. >> >> The last thing is that the number of per-position weights we need is >> a property of the crush rule we're using, not the bucket itself. So we >> should probably calculate it out for however many we think we need, and if >> we are doing a placement past that point, keep using the last weight. >> That means the degenerate case of just 1 position means we'll use it for >> all positions, which is what we'd want in the simple case anyway. > > I like this. The crush_do_rule could have an additional argument, a pointer to the weight set. If NULL it falls back to what we currently have. It is then up to the caller to build such a weight set. It can be addressing only part of the bucket tree or all of it, with num_positions matching the outsize. It is better seen as an argument to crush_do_rule rather than an integral part of the crushmap. > > Does that sound sensible ? > >> >> The last thing I think we should do while we're making this change is >> also let you specify an alternative id for the bucket when doing the hash. >> This is currently an annoying property of crush where if the bucket id >> changes all data shuffles. But we could add an int32_t id to the >> bucket_weights above so that you could have a new weight set using any id >> you want.. that'll be useful for lots of stuff I suspect. > > Ok. > > Cheers > >> >> sage >> >> >> >> >>> >>> Cheers >>> >>> [1] http://pad.ceph.com/p/crush-multiweight >>> [2] http://tracker.ceph.com/projects/ceph/wiki/CDM_05-APR-2017 >>> >>> -- >>> Loïc Dachary, Artisan Logiciel Libre >>> -- >>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> > -- Loïc Dachary, Artisan Logiciel Libre -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html