On Sun, 15 Apr 2012, wrote: > Hi everyone. > > One question is about performance of bucket algorithms, it seems that > on every aspect, straw bucket is at least not worse than list bucket. > So does it mean we shall always use straw bucket instead of list > bucket? Well, list on average will calculate a hash for 1/2 of the items, while straw always does every item. That said, we use straw by default for all buckets. As long as there is _some_ hierarchy, the individual buckets don't get too big, and CRUSH is so cheap anyway that it doesn't really matter... especially when compared to the cost of moving data due to a less-than-optimal remapping. > I was browsing branch tree of ceph to trace back earlier versions of > the implementation. I noticed that crush_uniform_bucket_choose has > been significantly changed. I read the comment which said the original > method (seems to be described in the thesis) is not random enough and > has some bad behavior, but I don't know what kind of bad behavior is > that. For what reason a random permutation is a better choice? If I remember correctly, the problem was that it would favor certain nodes when there were too many or certain patterns of failures within the bucket (I don't remember the details). The original algorithmw as supposed to give you a random permutation too, but it didn't always do that. I think that code is also used as a fallback if the sampling fails to return a good value after too many tries. This will be one of the things we look at in a month or two when we take a careful look at CRUSH. sage -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html