RE: straw2 and ln calculation

Sage Weil <sweil@xxxxxxxxxx> · Fri, 16 Jan 2015 09:18:10 -0800 (PST)

On Fri, 16 Jan 2015, Chen, Xiaoxi wrote:
> Hi Sage,
>        I have created a new pull request 
> (https://github.com/ceph/ceph/pull/3386) , which changed the crush_ln to 
> give more 32 digits, thus we don't need to do left shift now.
>         You win the bet :), the distribution is much better now, 
> originally the std_dev is 88~121, but now with 32 more digits, we can 
> reduce it to 12.4895(even lower than expected?????).

I had the stddev calculation wrong (missing a divide by n), now it looks 
more realistic.  The deviation does still increase with highly skewed 
weights, but I think that is inevitable because the low-weight items have 
a smaller number of placements.  I think a standard deviation probably 
isn't quite the right tool to measure this anyway (at least not the way 
I'm doing it, where I normalize the counts to the average weight).

In any case, this looks great now!  I pushed this and a few other cleanups 
to the wip-crush-straw2 branch, which I think should be in a final form 
now.

Can people please do a final review before we merge?

The final step will be to push this into the kernel client as well.

Thanks, everyone!
sage

 > 
>         Also note that 8 more digits and 32 more digits show almost the same result.
> 
> For 32 more digits:
> 
> expect                  169.664
> osd     weight  count   adjusted
> 0       1       179     179
> 1       1.75    291     166
> 2       3.062   507     165
> 3       5.359   937     174
> 4       9.379   1581    168
> 5       16.41   2771    168
> 6       28.72   4847    168
> 7       50.27   8690    172
> 8       87.96   14924   169
> 9       153.9   26069   169
> 10      269.4   45706   169
> 11      471.4   79754   169
> 12      825     139847  169
> 13      1444    245707  170
> 14      2527    428190  169
> std dev 12.4895
>      vs 12.6005 (expected)
> 
> For 8 more digits
> 
> expect                  169.664
> osd     weight  count   adjusted
> 0       1       179     179
> 1       1.75    290     165
> 2       3.062   507     165
> 3       5.359   936     174
> 4       9.379   1580    168
> 5       16.41   2771    168
> 6       28.72   4846    168
> 7       50.27   8690    172
> 8       87.96   14923   169
> 9       153.9   26068   169
> 10      269.4   45706   169
> 11      471.4   79754   169
> 12      825     139847  169
> 13      1444    245710  170
> 14      2527    428193  169
> std dev 12.5935
>      vs 12.5983 (expected).
> 
> 
> 
> 												Xiaoxi
> 
> 
> -----Original Message-----
> From: Sage Weil [mailto:sweil@xxxxxxxxxx] 
> Sent: Friday, January 16, 2015 10:22 AM
> To: Chen, Xiaoxi; Blinick, Stephen L
> Cc: ceph-devel@xxxxxxxxxxxxxxx
> Subject: straw2 and ln calculation
> 
> Hi everyone,
> 
> I took at look at the optimized straw2 code today.  There was one typo
> (s/i/u/) and it looks like it works as well as the previous version.  I've pushed the latest code and some additional tests to the wip-crush-straw2 branch.
> 
> I went on to do some tests of the variance in the resulting distribution when the weights are all identical vs somewhat variable vs very skewed. I found that for a more 'typical' distribution of weights (e.g, .5 - 3) the std deviation stays flat.  However, for very large skews, there are interesting effects that emerge based on very small differences in the ln value.  For example, here we adjust the table value into a negative integer in mapper.c:
> 
> 		ln = crush_ln(u) - 0x10000;
> 
> If I adjust that 0x10000 value only slightly, and generate a distribution across a set of scaled weights (e.g., 1, 2, 4, 8, 16, 32), I will see the low weights either more heavily or less heavily weighted than they should be.  0x10000 skews them a bit low, 0xffff skews them a bit high.  e.g., for 0x10000 I get
> 
> expect			169.664
> osd	weight	count	adjusted
> 0	1	102	102    <-- here
> 1	1.75	212	121
> 2	3.062	442	144
> 3	5.359	858	160
> 4	9.379	1483	158
> 5	16.41	2673	162
> 6	28.72	4756	165
> 7	50.27	8620	171
> 8	87.96	14851	168
> 9	153.9	26019	169
> 10	269.4	45674	169
> 11	471.4	79768	169
> 12	825	139949	169
> 13	1444	245958	170
> 14	2527	428635	169
> std dev 88.6993
>      vs 12.1484	(expected)
> 
> While for 0xffff I get:
> 
> expect			169.664
> osd	weight	count	adjusted
> 0	1	272	272
> 1	1.75	395	225
> 2	3.062	596	194
> 3	5.359	1009	188
> 4	9.379	1665	177
> 5	16.41	2869	174
> 6	28.72	4936	171
> 7	50.27	8769	174
> 8	87.96	14981	170
> 9	153.9	26102	169
> 10	269.4	45729	169
> 11	471.4	79737	169
> 12	825	139719	169
> 13	1444	245486	170
> 14	2527	427735	169
> std dev 121.243
>      vs 13.1205	(expected)
> 
> This could be written off to chance, but I see the same effect amplified for 0xfffe and 0x10001.
> 
> I suppose if we have to choose I think skewing high makes more sense since it means the high weight items won't skew high and overfill (although in practice that doesn't seem to be happening.. it's only on the low end that things get wonky.. the high end looks quite good).
> 
> In any case, two things:
> 
> 1) Maybe someone can check my math on the stddev calculation.  I'm scaling the actual placement count by the weight (adj = count/weight) and then doing the stddev of the adjusted values.  But the expected value is always about 1/3 or 1/4 of that.  I can't tell if that's because the hash is weak or because I'm doing something wrong with the calculation.  Notably, if I plug in rand() with equal weights for a sanity check, I still get a stddev of 953 vs expected 249... about what I see from straw2.  Strangely when I use straw I get a bit better than taht, 476.  Am I crazy?  See:
> 
>  ./unittest_crush_wrapper --gtest_filter=CrushWrapper.straw2
> 
> 2) Given how sensitive the ln value is to 0xffff vs 0x10000, I'm highly suspicious that slightly better precision will help us out.  Notably, we do
> 
> 		ln = crush_ln(u) - 0x10000;
> 
> 		/*
> 		 * divide by 16.16 fixed-point weight
> 		 */
> 		draw = (ln << 32) / bucket->item_weights[i];
> 
> ...so some extra bits of precision (instead of having the lower 32 zeroed) would probably help a lot.
> 
> Can crush_ln() be modifed to provide more bits of precision without increasing the size of the tables?  I don't really understand the math it's based on, but it looks like the same curve is being added in with different precision or something.. and a lot of bits in LH are being thrown out here
> 
>     LH >>= (48-12);
> 
> If there were 8 more bits returned from crush_ln and we shifted by 8 fewer bits in bucket_straw2_choose, for example, I bet we could eliminate some of the low-weight effect I'm seeing.
> 
> To see what I'm seeing, you can run
> 
>  ./unittest_crush_wrapper --gtest_filter=CrushWrapper.straw2_stddev
> 
> Any input would be helpful, thanks!
> sage
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html