在 2016/5/7 16:41, George Spelvin 写道:
Nothing critical, but a bit of kibitzing.
(That is slang in the Yiddish language for a person
who offers annoying and unwanted advice.)
The binary GCD algorithm is based on the following facts:
1. If a and b are all evens, then gcd(a,b) = 2 * gcd(a/2, b/2)
2. If a is even and b is odd, then gcd(a,b) = gcd(a/2, b)
3. If a and b are all odds, then gcd(a,b) = gcd((a-b)/2, b) = gcd((a+b)/2, b)
1) "even" and "odd" are adjectives. In English, adjectives to not have
plural suffixes. Thus, "they are even" or "they are odd".
2) Although "all" is not exactly wrong, it sounds odd. Since there are
exactly two of them it's clearer to say "both".
If I also rephrase the last line to fit into 80 columns, you get:
The binary GCD algorithm is based on the following facts:
- 1. If a and b are all evens, then gcd(a,b) = 2 * gcd(a/2, b/2)
+ 1. If a and b are both even, then gcd(a,b) = 2 * gcd(a/2, b/2)
2. If a is even and b is odd, then gcd(a,b) = gcd(a/2, b)
- 3. If a and b are all odds, then gcd(a,b) = gcd((a-b)/2, b) = gcd((a+b)/2, b)
+ 3. If both are odd, then gcd(a,b) = gcd((a-b)/2, b) = gcd((a+b)/2, b)
3) Negative config options are always confusing.
Would it be better to call it CONFIG_INEFFICIEBNT_FFS, or even simpler
CONFIG_SLOW_FFS?
Also, you're allowed to add "help" to a non-interactive config option,
and some documentation might be useful.
E.g.
+config CPU_SLOW_FFS
+ def_bool n
+ help
+ If n, the CPU supports a fast __ffs (__builtin_ctz) operation,
+ either directly or via a short code sequence using a count
+ leading zeros or population count instruction. If y, the
+ operation is emulated by slower software, such as an unrolled
+ binary search.
+
+ This is purely an optimization option; the kernel
+ will function correctly regardless of how it is set.
+
Thanks a lot.
Your benchmark code doesn't have to have a separate code path if
__x86_64__; rdtsc works on 32-bit code just as well. paths. And the
way you subtract the end and start times is unnecessarily complicated.
The C language guarantees that unsigned arithmetic simply wraps modulo
2^bits as expected.
rdsc works on x86, the other path is prepared for other architectures.
Here are some more timings, with the same flags as your tests:
First, 32 bit code:
gcd0 gcd1 gcd2 gcd3 gcd4
Ivy Bridge 3156 1192 1740 1160 1640 PASS
AMD Phenom 7150 2564 2348 2975 2843 PASS
Core 2 4176 2592 4164 2604 3900 PASS
Pentium 4 11492 4784 7632 4852 6452 PASS
And 64-bit (longer times becuase the inputs are larger):
Ivy Bridge 10636 2496 3500 2432 3360 PASS
AMD Phenom 19482 4058 6030 5001 6845 PASS
Looking at those, I'm not sure how much better the gcd3/4 versions are
than gcd1/2. The difference seems pretty minor and sometimes negative.
The worst case of binary GCD is that one number is power of 2,
for example a is 0xffffffff (0xcccccccc for the "even/odd" variant) and b is 1.
The gcd3/4 versions can handle this properly.
--
To unsubscribe from this list: send the line "unsubscribe linux-m68k" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html