As the comments in rwb_arm_timer says, we should speed up the process of integer square root with some variant versions. The implementation of the variant integer square root "int_fast_sqrt()" is based on the equation $lg(\sqrt{x}) = lg(x) / 2$ , where "lg" stands for logarithm with base 2. After we take the first approximation by calculate $2^{lg(x)/2}$ , we utilize Newton's method for three times in order to enhance the precision. To prove "int_fastsqrt" is indeed faster than the "int_sqrt", we take some bench experiments to calculate the integer logarithm from 1 to 10^6 and use "perf stat --repeat 100" to measure the performance of each function. As the result shown, the origin version of integer square root, which is "int_sqrt" takes 35.37 msec task-clock, 1,2181,3348 cycles, 1,6095,3665 instructions, 2551,2990 branches and causes 1,0616 branch-misses. At the same time, the variant version of integer square root, which is "int_fastsqrt" takes 33.96 msec task-clock, 1,1645,7487 cyclces, 5621,0086 instructions, 321,0409 branches and causes 2407 branch-misses. We can clearly see that "int_fastsqrt" performs faster and better result so it's indeed a faster invariant of integer square root. The experiments runs on x86_64 GNU/Linux Architecture and the CPU is Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz. Signed-off-by: I Hsin Cheng <richard120310@xxxxxxxxx> --- block/blk-wbt.c | 2 +- lib/math/int_sqrt.c | 21 +++++++++++++++++++++ 2 files changed, 22 insertions(+), 1 deletion(-) diff --git a/block/blk-wbt.c b/block/blk-wbt.c index 0bb613139..8d25c0e55 100644 --- a/block/blk-wbt.c +++ b/block/blk-wbt.c @@ -407,7 +407,7 @@ static void rwb_arm_timer(struct rq_wb *rwb) * though. */ rwb->cur_win_nsec = div_u64(rwb->win_nsec << 4, - int_sqrt((rqd->scale_step + 1) << 8)); + int_fastsqrt((rqd->scale_step + 1) << 8)); } else { /* * For step < 0, we don't want to increase/decrease the diff --git a/lib/math/int_sqrt.c b/lib/math/int_sqrt.c index a8170bb91..e25ba179e 100644 --- a/lib/math/int_sqrt.c +++ b/lib/math/int_sqrt.c @@ -40,6 +40,27 @@ unsigned long int_sqrt(unsigned long x) } EXPORT_SYMBOL(int_sqrt); + +/** + * int_fastsqrt - faster invariant of int_sqrt + * @x: integer of which to calculate the sqrt + * + * Computes: floor(sqrt(x)) + */ + +unsigned long int_fastsqrt(unsigned long x) +{ + unsigned long y = ((31 - __builtin_clz(x | 1)) >> 1); + unsigned long z = 1 << y; + + z = (z + (x / z)) >> 1; + z = (z + (x / z)) >> 1; + z = (z + (x / z)) >> 1; + + return z; +} +EXPORT_SYMBOL(int_fastsqrt); + #if BITS_PER_LONG < 64 /** * int_sqrt64 - strongly typed int_sqrt function when minimum 64 bit input -- 2.34.1