On Mon, 25 Oct 2021 13:08:38 -0700 Kalesh Singh <kaleshsingh@xxxxxxxxxx> wrote: > == Results == > > Divisor is a power of 2 (divisor == 32): > > test_hist_field_div_not_optimized | 8,717,091 cpu-cycles > test_hist_field_div_optimized | 1,643,137 cpu-cycles > > If the divisor is a power of 2, the optimized version is ~5.3x faster. > > Divisor is not a power of 2 (divisor == 33): > > test_hist_field_div_not_optimized | 4,444,324 cpu-cycles > test_hist_field_div_optimized | 5,497,958 cpu-cycles To optimize this even more, if the divisor is constant, we could make a separate function to not do the branch, and just shift or divide. And even if it is not a power of 2, for constants, we could implement a multiplication and shift, and guarantee an accuracy up to a defined max. If div is a constant, then we can calculate the mult and shift, and max dividend. Let's use 20 for shift. // This works best for small divisors if (div > max_div) { // only do a real division return; } shift = 20; mult = ((1 << shift) + div - 1) / div; delta = mult * div - (1 << shift); if (!delta) { /* div is a power of 2 */ max = -1; return; } max = (1 << shift) / delta; We would of course need to use 64 bit operations (maybe only do this for 64 bit machines). And perhaps even use bigger shift values to get a bigger max. Then we could do: if (val1 < max) return (val1 * mult) >> shift; -- Steve