Re: Questions about bucket algorithm and bucket choose

Sage Weil <sage@xxxxxxxxxxxx> · Sat, 21 Apr 2012 21:21:25 -0700 (PDT)

On Sat, 21 Apr 2012,  wrote:
> Oh, perhaps I find the reason for the bad behavior.
> 
> According to Jenkins hash algorithm, any number of lower bits of the
> hash result is uniformly distributed, but the modular arithmetic can
> destroy the uniform distribution, unless the size of the bucket is the
> power of 2!
> 
> Am I right?

Maybe, but I'm not ready to blame the hash yet.  Is there a specific map 
you're looking at?  Can you attach that so we can be sure we're talking 
about the same thing?

sage

> 
> On Tue, Apr 17, 2012 at 5:29 AM, Sage Weil <sage@xxxxxxxxxxxx> wrote:
> > On Sun, 15 Apr 2012,  wrote:
> >> Hi everyone.
> >>
> >> One question is about performance of bucket algorithms, it seems that
> >> on every aspect, straw bucket is at least not worse than list bucket.
> >> So does it mean we shall always use straw bucket instead of list
> >> bucket?
> >
> > Well, list on average will calculate a hash for 1/2 of the items, while
> > straw always does every item.  That said, we use straw by default for all
> > buckets.  As long as there is _some_ hierarchy, the individual buckets
> > don't get too big, and CRUSH is so cheap anyway that it doesn't really
> > matter... especially when compared to the cost of moving data due to a
> > less-than-optimal remapping.
> >
> >> I was browsing branch tree of ceph to trace back earlier versions of
> >> the implementation. I noticed that crush_uniform_bucket_choose has
> >> been significantly changed. I read the comment which said the original
> >> method (seems to be described in the thesis) is not random enough and
> >> has some bad behavior, but I don't know what kind of bad behavior is
> >> that. For what reason a random permutation is a better choice?
> >
> > If I remember correctly, the problem was that it would favor certain nodes
> > when there were too many or certain patterns of failures within the
> > bucket (I don't remember the details).  The original algorithmw as
> > supposed to give you a random permutation too, but it didn't always do
> > that.
> >
> > I think that code is also used as a fallback if the sampling fails to
> > return a good value after too many tries.
> >
> > This will be one of the things we look at in a month or two when we
> > take a careful look at CRUSH.
> >
> > sage
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
>