Great, thanks for the clarifications. I also think that the most natural way is to keep just a set of weights in the CRUSH map and update them inside the algorithm. I keep working on it. 2017-03-08 0:06 GMT+01:00 Sage Weil <sage@xxxxxxxxxxxx>: > Hi Pedro, > > Thanks for taking a look at this! It's a frustrating problem and we > haven't made much headway. > > On Thu, 2 Mar 2017, Pedro López-Adeva wrote: >> Hi, >> >> I will have a look. BTW, I have not progressed that much but I have >> been thinking about it. In order to adapt the previous algorithm in >> the python notebook I need to substitute the iteration over all >> possible devices permutations to iteration over all the possible >> selections that crush would make. That is the main thing I need to >> work on. >> >> The other thing is of course that weights change for each replica. >> That is, they cannot be really fixed in the crush map. So the >> algorithm inside libcrush, not only the weights in the map, need to be >> changed. The weights in the crush map should reflect then, maybe, the >> desired usage frequencies. Or maybe each replica should have their own >> crush map, but then the information about the previous selection >> should be passed to the next replica placement run so it avoids >> selecting the same one again. > > My suspicion is that the best solution here (whatever that means!) > leaves the CRUSH weights intact with the desired distribution, and > then generates a set of derivative weights--probably one set for each > round/replica/rank. > > One nice property of this is that once the support is added to encode > multiple sets of weights, the algorithm used to generate them is free to > change and evolve independently. (In most cases any change is > CRUSH's mapping behavior is difficult to roll out because all > parties participating in the cluster have to support any new behavior > before it is enabled or used.) > >> I have a question also. Is there any significant difference between >> the device selection algorithm description in the paper and its final >> implementation? > > The main difference is the "retry_bucket" behavior was found to be a bad > idea; any collision or failed()/overload() case triggers the > retry_descent. > > There are other changes, of course, but I don't think they'll impact any > solution we come with here (or at least any solution can be suitably > adapted)! > > sage -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html