On Sat, 21 Apr 2012, wrote: > Oh, perhaps I find the reason for the bad behavior. > > According to Jenkins hash algorithm, any number of lower bits of the > hash result is uniformly distributed, but the modular arithmetic can > destroy the uniform distribution, unless the size of the bucket is the > power of 2! > > Am I right? Maybe, but I'm not ready to blame the hash yet. Is there a specific map you're looking at? Can you attach that so we can be sure we're talking about the same thing? sage > > On Tue, Apr 17, 2012 at 5:29 AM, Sage Weil <sage@xxxxxxxxxxxx> wrote: > > On Sun, 15 Apr 2012, wrote: > >> Hi everyone. > >> > >> One question is about performance of bucket algorithms, it seems that > >> on every aspect, straw bucket is at least not worse than list bucket. > >> So does it mean we shall always use straw bucket instead of list > >> bucket? > > > > Well, list on average will calculate a hash for 1/2 of the items, while > > straw always does every item. That said, we use straw by default for all > > buckets. As long as there is _some_ hierarchy, the individual buckets > > don't get too big, and CRUSH is so cheap anyway that it doesn't really > > matter... especially when compared to the cost of moving data due to a > > less-than-optimal remapping. > > > >> I was browsing branch tree of ceph to trace back earlier versions of > >> the implementation. I noticed that crush_uniform_bucket_choose has > >> been significantly changed. I read the comment which said the original > >> method (seems to be described in the thesis) is not random enough and > >> has some bad behavior, but I don't know what kind of bad behavior is > >> that. For what reason a random permutation is a better choice? > > > > If I remember correctly, the problem was that it would favor certain nodes > > when there were too many or certain patterns of failures within the > > bucket (I don't remember the details). The original algorithmw as > > supposed to give you a random permutation too, but it didn't always do > > that. > > > > I think that code is also used as a fallback if the sampling fails to > > return a good value after too many tries. > > > > This will be one of the things we look at in a month or two when we > > take a careful look at CRUSH. > > > > sage > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > >