RE: question about the 'r' value in CRUSH

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 25 Jan 2016, Xusangdi wrote:
> In normal scenario, this behavior is not that outstanding, but it does exist. Please see below:
> 
> $ crushtool -o crushmap --build --num_osds 12 host straw2 3 root straw2 0
> $ crushtool -i crushmap --test --x 0 --num-rep 3 --show-mappings --weight 0 0
[...]

The retries are normal.  You just need to leave choose_tries to a large 
enough value (default is now 50, I believe) so that you always get a full 
sized result.  It is a problem if you have a very small set of potentially 
suitable results, though (as with your 3-item tree with skewed weights, or 
perhaps a larger tree but with most of the OSDs marked 'out').

sage



> CRUSHCHOOSE_LEAF bucket -5 x 0 outpos 0 numrep 3 tries 51 recurse_tries 1 local_retries 0 local_fallback_retries 0 parent_r 0 stable 0
>  crush_bucket_choose -5 x=0 r=0
>   item -1 type 1
> CHOOSE bucket -1 x 0 outpos 0 numrep 1 tries 1 recurse_tries 0 local_retries 0 local_fallback_retries 0 parent_r 0 stable 0
>  crush_bucket_choose -1 x=0 r=0
>   item 0 type 0
>   reject 1  collide 0  ftotal 1  flocal 1
> skip rep
> CHOOSE returns 0
>   reject 1  collide 0  ftotal 1  flocal 1
>  crush_bucket_choose -5 x=0 r=1
>   item -4 type 1
> CHOOSE bucket -4 x 0 outpos 0 numrep 1 tries 1 recurse_tries 0 local_retries 0 local_fallback_retries 0 parent_r 0 stable 0
>  crush_bucket_choose -4 x=0 r=0
>   item 11 type 0
> CHOOSE got 11
> CHOOSE returns 1
> CHOOSE got -4
>  crush_bucket_choose -5 x=0 r=1 <== redundant try
>   item -4 type 1
>   reject 0  collide 1  ftotal 1  flocal 1
>  crush_bucket_choose -5 x=0 r=2
>   item -4 type 1
>   reject 0  collide 1  ftotal 2  flocal 1
>  crush_bucket_choose -5 x=0 r=3
>   item -2 type 1
> CHOOSE bucket -2 x 0 outpos 1 numrep 2 tries 1 recurse_tries 0 local_retries 0 local_fallback_retries 0 parent_r 0 stable 0
>  crush_bucket_choose -2 x=0 r=1
>   item 5 type 0
> CHOOSE got 5
> CHOOSE returns 2
> CHOOSE got -2
>  crush_bucket_choose -5 x=0 r=2 <== redundant try
>   item -4 type 1
>   reject 0  collide 1  ftotal 1  flocal 1
>  crush_bucket_choose -5 x=0 r=3 <== redundant try
>   item -2 type 1
>   reject 0  collide 1  ftotal 2  flocal 1
>  crush_bucket_choose -5 x=0 r=4
>   item -2 type 1
>   reject 0  collide 1  ftotal 3  flocal 1
>  crush_bucket_choose -5 x=0 r=5
>   item -1 type 1
> CHOOSE bucket -1 x 0 outpos 2 numrep 3 tries 1 recurse_tries 0 local_retries 0 local_fallback_retries 0 parent_r 0 stable 0
>  crush_bucket_choose -1 x=0 r=2
>   item 0 type 0
>   reject 1  collide 0  ftotal 1  flocal 1
> skip rep
> CHOOSE returns 2
>   reject 1  collide 0  ftotal 4  flocal 1
>  crush_bucket_choose -5 x=0 r=6
>   item -4 type 1
>   reject 0  collide 1  ftotal 5  flocal 1
>  crush_bucket_choose -5 x=0 r=7
>   item -4 type 1
>   reject 0  collide 1  ftotal 6  flocal 1
>  crush_bucket_choose -5 x=0 r=8
>   item -2 type 1
>   reject 0  collide 1  ftotal 7  flocal 1
>  crush_bucket_choose -5 x=0 r=9
>   item -4 type 1
>   reject 0  collide 1  ftotal 8  flocal 1
>  crush_bucket_choose -5 x=0 r=10
>   item -3 type 1
> CHOOSE bucket -3 x 0 outpos 2 numrep 3 tries 1 recurse_tries 0 local_retries 0 local_fallback_retries 0 parent_r 0 stable 0
>  crush_bucket_choose -3 x=0 r=2
>   item 7 type 0
> CHOOSE got 7
> CHOOSE returns 3
> CHOOSE got -3
> CHOOSE returns 3
>  rule 0 x 0 [11,5,7]
> 
> - - - - - - - - - - - - - - - - - - - -
> Sangdi Xu
> UIS 2, Team BORE
> 
> > -----Original Message-----
> > From: Sage Weil [mailto:sweil@xxxxxxxxxx]
> > Sent: Sunday, January 24, 2016 10:35 PM
> > To: xusangdi 11976 (RD)
> > Cc: ceph-devel@xxxxxxxxxxxxxxx
> > Subject: Re: question about the 'r' value in CRUSH
> >
> > On Sat, 23 Jan 2016, Xusangdi wrote:
> > > Hi Sage,
> > >
> > > Recently we encountered an interesting case when learning about CRUSH, please see below:
> > >
> > > root root {
> > >     id -4       # do not change unnecessarily
> > >     # weight 36.000
> > >     alg straw2
> > >     hash 0  # rjenkins1
> > >     item host0 weight 3.000
> > >     item host1 weight 3.000
> > >     item host2 weight 30.000
> > > }
> > >
> > > CRUSHCHOOSE_LEAF bucket -4 x 3 outpos 0 numrep 3 tries 51
> > > recurse_tries 1 local_retries 0 local_fallback_retries 0 parent_r 0 stable 0  crush_bucket_choose -4
> > x=3 r=0
> > >   item -3 type 1
> > > CHOOSE bucket -3 x 3 outpos 0 numrep 1 tries 1 recurse_tries 0
> > > local_retries 0 local_fallback_retries 0 parent_r 0 stable 0  crush_bucket_choose -3 x=3 r=0
> > >   item 8 type 0
> > > CHOOSE got 8
> > > CHOOSE returns 1
> > > CHOOSE got -3
> > >  crush_bucket_choose -4 x=3 r=1
> > >   item -3 type 1
> > >   reject 0  collide 1  ftotal 1  flocal 1  crush_bucket_choose -4 x=3
> > > r=2
> > >   item -3 type 1
> > >   reject 0  collide 1  ftotal 2  flocal 1  crush_bucket_choose -4 x=3
> > > r=3
> > >   item -3 type 1
> > >   reject 0  collide 1  ftotal 3  flocal 1  crush_bucket_choose -4 x=3
> > > r=4
> > >   item -3 type 1
> > >   reject 0  collide 1  ftotal 4  flocal 1  crush_bucket_choose -4 x=3
> > > r=5
> > >   item -3 type 1
> > >   reject 0  collide 1  ftotal 5  flocal 1  crush_bucket_choose -4 x=3
> > > r=6
> > >   item -3 type 1
> > >   reject 0  collide 1  ftotal 6  flocal 1  crush_bucket_choose -4 x=3
> > > r=7
> > >   item -3 type 1
> > >   reject 0  collide 1  ftotal 7  flocal 1  crush_bucket_choose -4 x=3
> > > r=8
> > >   item -3 type 1
> > >   reject 0  collide 1  ftotal 8  flocal 1  crush_bucket_choose -4 x=3
> > > r=9
> > >   item -3 type 1
> > >   reject 0  collide 1  ftotal 9  flocal 1  crush_bucket_choose -4 x=3
> > > r=10
> > >   item -3 type 1
> > >   reject 0  collide 1  ftotal 10  flocal 1  crush_bucket_choose -4 x=3
> > > r=11
> > >   item -3 type 1
> > >   reject 0  collide 1  ftotal 11  flocal 1  crush_bucket_choose -4 x=3
> > > r=12
> > >   item -3 type 1
> > >   reject 0  collide 1  ftotal 12  flocal 1  crush_bucket_choose -4 x=3
> > > r=13
> > >   item -3 type 1
> > >   reject 0  collide 1  ftotal 13  flocal 1  crush_bucket_choose -4 x=3
> > > r=14
> > >   item -3 type 1
> > >   reject 0  collide 1  ftotal 14  flocal 1  crush_bucket_choose -4 x=3
> > > r=15
> > >   item -3 type 1
> > >   reject 0  collide 1  ftotal 15  flocal 1  crush_bucket_choose -4 x=3
> > > r=16
> > >   item -3 type 1
> > >   reject 0  collide 1  ftotal 16  flocal 1  crush_bucket_choose -4 x=3
> > > r=17
> > >   item -3 type 1
> > >   reject 0  collide 1  ftotal 17  flocal 1  crush_bucket_choose -4 x=3
> > > r=18
> > >   item -3 type 1
> > >   reject 0  collide 1  ftotal 18  flocal 1  crush_bucket_choose -4 x=3
> > > r=19
> > >   item -2 type 1
> > > CHOOSE bucket -2 x 3 outpos 1 numrep 2 tries 1 recurse_tries 0
> > > local_retries 0 local_fallback_retries 0 parent_r 0 stable 0  crush_bucket_choose -2 x=3 r=1
> > >   item 4 type 0
> > > CHOOSE got 4
> > > CHOOSE returns 2
> > > CHOOSE got -2
> > >  crush_bucket_choose -4 x=3 r=2
> > >   item -3 type 1
> > >   reject 0  collide 1  ftotal 1  flocal 1  crush_bucket_choose -4 x=3
> > > r=3
> > >   item -3 type 1
> > >   reject 0  collide 1  ftotal 2  flocal 1  crush_bucket_choose -4 x=3
> > > r=4
> > >   item -3 type 1
> > >   reject 0  collide 1  ftotal 3  flocal 1  crush_bucket_choose -4 x=3
> > > r=5
> > >   item -3 type 1
> > >   reject 0  collide 1  ftotal 4  flocal 1  crush_bucket_choose -4 x=3
> > > r=6
> > >   item -3 type 1
> > >   reject 0  collide 1  ftotal 5  flocal 1  crush_bucket_choose -4 x=3
> > > r=7
> > >   item -3 type 1
> > >   reject 0  collide 1  ftotal 6  flocal 1  crush_bucket_choose -4 x=3
> > > r=8
> > >   item -3 type 1
> > >   reject 0  collide 1  ftotal 7  flocal 1  crush_bucket_choose -4 x=3
> > > r=9
> > >   item -3 type 1
> > >   reject 0  collide 1  ftotal 8  flocal 1  crush_bucket_choose -4 x=3
> > > r=10
> > >   item -3 type 1
> > >   reject 0  collide 1  ftotal 9  flocal 1  crush_bucket_choose -4 x=3
> > > r=11
> > >   item -3 type 1
> > >   reject 0  collide 1  ftotal 10  flocal 1  crush_bucket_choose -4 x=3
> > > r=12
> > >   item -3 type 1
> > >   reject 0  collide 1  ftotal 11  flocal 1  crush_bucket_choose -4 x=3
> > > r=13
> > >   item -3 type 1
> > >   reject 0  collide 1  ftotal 12  flocal 1  crush_bucket_choose -4 x=3
> > > r=14
> > >   item -3 type 1
> > >   reject 0  collide 1  ftotal 13  flocal 1  crush_bucket_choose -4 x=3
> > > r=15
> > >   item -3 type 1
> > >   reject 0  collide 1  ftotal 14  flocal 1  crush_bucket_choose -4 x=3
> > > r=16
> > >   item -3 type 1
> > >   reject 0  collide 1  ftotal 15  flocal 1  crush_bucket_choose -4 x=3
> > > r=17
> > >   item -3 type 1
> > >   reject 0  collide 1  ftotal 16  flocal 1  crush_bucket_choose -4 x=3
> > > r=18
> > >   item -3 type 1
> > >   reject 0  collide 1  ftotal 17  flocal 1  crush_bucket_choose -4 x=3
> > > r=19
> > >   item -2 type 1
> > >   reject 0  collide 1  ftotal 18  flocal 1  crush_bucket_choose -4 x=3
> > > r=20
> > >   item -2 type 1
> > >   reject 0  collide 1  ftotal 19  flocal 1  crush_bucket_choose -4 x=3
> > > r=21
> > >   item -3 type 1
> > >   reject 0  collide 1  ftotal 20  flocal 1  crush_bucket_choose -4 x=3
> > > r=22
> > >   item -3 type 1
> > >   reject 0  collide 1  ftotal 21  flocal 1  crush_bucket_choose -4 x=3
> > > r=23
> > >   item -1 type 1
> > > CHOOSE bucket -1 x 3 outpos 2 numrep 3 tries 1 recurse_tries 0
> > > local_retries 0 local_fallback_retries 0 parent_r 0 stable 0  crush_bucket_choose -1 x=3 r=2
> > >   item 0 type 0
> > > CHOOSE got 0
> > > CHOOSE returns 3
> > > CHOOSE got -1
> > > CHOOSE returns 3
> > >  rule 0 x 3 [8,4,0]
> > >
> > > It looks that when choosing the third replica, we repeat a lot of tries which have already tried for the
> > second one.
> > > Is this intended or maybe an issue?
> >
> > It's because one bucket is weighted so much more heavily than the others.
> > You're asking for something that's somewhat impossible: 3 replicas coming from 3 items with different
> > weights.  Since we're sampling, if the weights are too skewed you'll run out of tries (r values) and it'll
> > give up with only 2 results.
> >
> > sage
> -------------------------------------------------------------------------------------------------------------------------------------
> 本邮件及其附件含有杭州华三通信技术有限公司的保密信息,仅限于发送给上面地址中列出
> 的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、
> 或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本
> 邮件!
> This e-mail and its attachments contain confidential information from H3C, which is
> intended only for the person or entity whose address is listed above. Any use of the
> information contained herein in any way (including, but not limited to, total or partial
> disclosure, reproduction, or dissemination) by persons other than the intended
> recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender
> by phone or email immediately and delete it!
> N?????r??y??????X??ǧv???)޺{.n?????z?]z????ay?ʇڙ??j??f???h??????w??????j:+v???w????????????zZ+???????j"????i

[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux