Re: trying to understanding crush more deeply

Will Zhao <zhao6305@xxxxxxxxx> · Sat, 23 Sep 2017 00:05:14 +0800

Thanks !   I still have a question. Like the code in bucket_straw2_choose below：

u = crush_hash32_3(bucket->h.hash, x, ids[i], r);
u &= 0xffff;
ln = crush_ln(u) - 0x1000000000000ll;
draw = div64_s64(ln, weights[i]);

Because the x , id, r , don't change, so the ln won't change for old
bucket, add osd or remove osd only change the weight.  Suppose for
pgi, there are 3 bucket(host) with weight 3w, add one host with weight
w, there are 4 buckets with weight now.  This means the movement will
depend on  ln value , am I understand right ? I don't understand how
this make sure the new bucket get desirable pgs ?  I read
https://en.wikipedia.org/wiki/Jenkins_hash_function and
http://en.wikipedia.org/wiki/Exponential_distribution,  But I can't
link them  together to understand this ? Can you explain something
about this ?  Apologize for my dummy. And thank you very much  .  : )

On Fri, Sep 22, 2017 at 3:50 PM, Maged Mokhtar <mmokhtar@xxxxxxxxxxx> wrote:
> Per section 3.4.4 The default bucket type straw computes the hash of (PG
> number, replica number, bucket id) for all buckets using the Jenkins integer
> hashing function, then multiply this by bucket weight (for OSD disks the
> weight of 1 is for 1 TB, for higher level it is the sum of contained
> weights). The selection function chooses the bucket/disk with the max value:
> c(r,x) = maxi ( f (wi)hash(x, r, i))
>
> So if you add a OSD disk, there is a new disk id that enters this
> competition and will get PG from other OSDs proportional to its weight,
> which is a desirable effect, but a side effect is that the weight hierarchy
> has slightly changed so now some older buckets may win PGs from other older
> buckets according to the hash function.
>
> So straw does have overhead when adding (rather than replacing), it does not
> do minimal PG re-assignments. But it terms of overall efficiency of
> adding/removing of buckets at end and in middle of hierarchy it is the best
> overall over other algorithms as seen on chart 5 and table 2.
>
> On 2017-09-22 08:36, Will Zhao wrote:
>
> Hi Sage  and all :
>     I am tring to understand cursh more deeply. I have tried to read
> the code and paper, and search the mail list archives ,  but I still
> have some questions and can't understand it well.
>     If I have 100 osds, and when I add a osd ,  the osdmap changes,
> and how the pg is recaulated to make sure the data movement is
> minimal.  I tried to use crushtool --show-mappings --num-rep 3  --test
>  -i map , through changing the map for 100osds and 101 osds , to look
> the result , it looks like the pgmap changed a lot .  Shouldn't the
> remap  only happen to some of the pgs ? Or crush from adding  a pg is
> different from a new osdmap ? I konw I must understand something
> wrong. I appreciate if you can explain more about the logic of adding
> a osd . Or is there  more doc that I can read ? Thank you very much
> !!! : )
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com