On 3/29/24 12:15 PM, Bart Van Assche wrote: > On 3/29/24 2:12 AM, I Hsin Cheng wrote: >> As the result shown, the origin version of integer square root, which is >> "int_sqrt" takes 35.37 msec task-clock, 1,2181,3348 cycles, 1,6095,3665 >> instructions, 2551,2990 branches and causes 1,0616 branch-misses. >> >> At the same time, the variant version of integer square root, which is >> "int_fastsqrt" takes 33.96 msec task-clock, 1,1645,7487 cyclces, >> 5621,0086 instructions, 321,0409 branches and causes 2407 branch-misses. >> We can clearly see that "int_fastsqrt" performs faster and better result >> so it's indeed a faster invariant of integer square root. > > I'm not sure that a 4% performance improvement is sufficient to > replace the int_sqrt() implementation. Additionally, why to add a > second implementation of int_sqrt() instead of replacing the > int_sqrt() implementation in lib/math/int_sqrt.c? That's the real question imho - if provides the same numbers and is faster, why have two? I ran a quick test because I was curious, and the precision is definitely worse. The claim that it is floor(sqrt(val)) is not true. Trivial example: 1005117225 sqrt() 31703.58 int_sqrt() 30703 int_fastsqrt() 30821 whether this matters, probably not, but then again it's hard to care about a slow path sqrt calculation. I'd certainly err on the side of precision for that. -- Jens Axboe