Re: Questions about bucket algorithm and bucket choose

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Oh, perhaps I find the reason for the bad behavior.

According to Jenkins hash algorithm, any number of lower bits of the
hash result is uniformly distributed, but the modular arithmetic can
destroy the uniform distribution, unless the size of the bucket is the
power of 2!

Am I right?

On Tue, Apr 17, 2012 at 5:29 AM, Sage Weil <sage@xxxxxxxxxxxx> wrote:
> On Sun, 15 Apr 2012,  wrote:
>> Hi everyone.
>>
>> One question is about performance of bucket algorithms, it seems that
>> on every aspect, straw bucket is at least not worse than list bucket.
>> So does it mean we shall always use straw bucket instead of list
>> bucket?
>
> Well, list on average will calculate a hash for 1/2 of the items, while
> straw always does every item.  That said, we use straw by default for all
> buckets.  As long as there is _some_ hierarchy, the individual buckets
> don't get too big, and CRUSH is so cheap anyway that it doesn't really
> matter... especially when compared to the cost of moving data due to a
> less-than-optimal remapping.
>
>> I was browsing branch tree of ceph to trace back earlier versions of
>> the implementation. I noticed that crush_uniform_bucket_choose has
>> been significantly changed. I read the comment which said the original
>> method (seems to be described in the thesis) is not random enough and
>> has some bad behavior, but I don't know what kind of bad behavior is
>> that. For what reason a random permutation is a better choice?
>
> If I remember correctly, the problem was that it would favor certain nodes
> when there were too many or certain patterns of failures within the
> bucket (I don't remember the details).  The original algorithmw as
> supposed to give you a random permutation too, but it didn't always do
> that.
>
> I think that code is also used as a fallback if the sampling fails to
> return a good value after too many tries.
>
> This will be one of the things we look at in a month or two when we
> take a careful look at CRUSH.
>
> sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux