Thanks ! I still have a question. Like the code in bucket_straw2_choose below: u = crush_hash32_3(bucket->h.hash, x, ids[i], r); u &= 0xffff; ln = crush_ln(u) - 0x1000000000000ll; draw = div64_s64(ln, weights[i]); Because the x , id, r , don't change, so the ln won't change for old bucket, add osd or remove osd only change the weight. Suppose for pgi, there are 3 bucket(host) with weight 3w, add one host with weight w, there are 4 buckets with weight now. This means the movement will depend on ln value , am I understand right ? I don't understand how this make sure the new bucket get desirable pgs ? I read https://en.wikipedia.org/wiki/Jenkins_hash_function and http://en.wikipedia.org/wiki/Exponential_distribution, But I can't link them together to understand this ? Can you explain something about this ? Apologize for my dummy. And thank you very much . : ) On Fri, Sep 22, 2017 at 3:50 PM, Maged Mokhtar <mmokhtar@xxxxxxxxxxx> wrote: > Per section 3.4.4 The default bucket type straw computes the hash of (PG > number, replica number, bucket id) for all buckets using the Jenkins integer > hashing function, then multiply this by bucket weight (for OSD disks the > weight of 1 is for 1 TB, for higher level it is the sum of contained > weights). The selection function chooses the bucket/disk with the max value: > c(r,x) = maxi ( f (wi)hash(x, r, i)) > > So if you add a OSD disk, there is a new disk id that enters this > competition and will get PG from other OSDs proportional to its weight, > which is a desirable effect, but a side effect is that the weight hierarchy > has slightly changed so now some older buckets may win PGs from other older > buckets according to the hash function. > > So straw does have overhead when adding (rather than replacing), it does not > do minimal PG re-assignments. But it terms of overall efficiency of > adding/removing of buckets at end and in middle of hierarchy it is the best > overall over other algorithms as seen on chart 5 and table 2. > > On 2017-09-22 08:36, Will Zhao wrote: > > Hi Sage and all : > I am tring to understand cursh more deeply. I have tried to read > the code and paper, and search the mail list archives , but I still > have some questions and can't understand it well. > If I have 100 osds, and when I add a osd , the osdmap changes, > and how the pg is recaulated to make sure the data movement is > minimal. I tried to use crushtool --show-mappings --num-rep 3 --test > -i map , through changing the map for 100osds and 101 osds , to look > the result , it looks like the pgmap changed a lot . Shouldn't the > remap only happen to some of the pgs ? Or crush from adding a pg is > different from a new osdmap ? I konw I must understand something > wrong. I appreciate if you can explain more about the logic of adding > a osd . Or is there more doc that I can read ? Thank you very much > !!! : ) > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com