? 2016/5/7 16:41, George Spelvin ??: > Nothing critical, but a bit of kibitzing. > (That is slang in the Yiddish language for a person > who offers annoying and unwanted advice.) > >> The binary GCD algorithm is based on the following facts: >> 1. If a and b are all evens, then gcd(a,b) = 2 * gcd(a/2, b/2) >> 2. If a is even and b is odd, then gcd(a,b) = gcd(a/2, b) >> 3. If a and b are all odds, then gcd(a,b) = gcd((a-b)/2, b) = gcd((a+b)/2, b) > 1) "even" and "odd" are adjectives. In English, adjectives to not have > plural suffixes. Thus, "they are even" or "they are odd". > 2) Although "all" is not exactly wrong, it sounds odd. Since there are > exactly two of them it's clearer to say "both". > > If I also rephrase the last line to fit into 80 columns, you get: > > The binary GCD algorithm is based on the following facts: > - 1. If a and b are all evens, then gcd(a,b) = 2 * gcd(a/2, b/2) > + 1. If a and b are both even, then gcd(a,b) = 2 * gcd(a/2, b/2) > 2. If a is even and b is odd, then gcd(a,b) = gcd(a/2, b) > - 3. If a and b are all odds, then gcd(a,b) = gcd((a-b)/2, b) = gcd((a+b)/2, b) > + 3. If both are odd, then gcd(a,b) = gcd((a-b)/2, b) = gcd((a+b)/2, b) > > 3) Negative config options are always confusing. > > Would it be better to call it CONFIG_INEFFICIEBNT_FFS, or even simpler > CONFIG_SLOW_FFS? > > Also, you're allowed to add "help" to a non-interactive config option, > and some documentation might be useful. > E.g. > > +config CPU_SLOW_FFS > + def_bool n > + help > + If n, the CPU supports a fast __ffs (__builtin_ctz) operation, > + either directly or via a short code sequence using a count > + leading zeros or population count instruction. If y, the > + operation is emulated by slower software, such as an unrolled > + binary search. > + > + This is purely an optimization option; the kernel > + will function correctly regardless of how it is set. > + Thanks a lot. > Your benchmark code doesn't have to have a separate code path if > __x86_64__; rdtsc works on 32-bit code just as well. paths. And the > way you subtract the end and start times is unnecessarily complicated. > The C language guarantees that unsigned arithmetic simply wraps modulo > 2^bits as expected. rdsc works on x86, the other path is prepared for other architectures. > Here are some more timings, with the same flags as your tests: > > First, 32 bit code: > gcd0 gcd1 gcd2 gcd3 gcd4 > Ivy Bridge 3156 1192 1740 1160 1640 PASS > AMD Phenom 7150 2564 2348 2975 2843 PASS > Core 2 4176 2592 4164 2604 3900 PASS > Pentium 4 11492 4784 7632 4852 6452 PASS > > And 64-bit (longer times becuase the inputs are larger): > Ivy Bridge 10636 2496 3500 2432 3360 PASS > AMD Phenom 19482 4058 6030 5001 6845 PASS > > Looking at those, I'm not sure how much better the gcd3/4 versions are > than gcd1/2. The difference seems pretty minor and sometimes negative. > The worst case of binary GCD is that one number is power of 2, for example a is 0xffffffff (0xcccccccc for the "even/odd" variant) and b is 1. The gcd3/4 versions can handle this properly.