On Tue, Oct 26, 2021 at 12:14 PM Steven Rostedt <rostedt@xxxxxxxxxxx> wrote: > > On Mon, 25 Oct 2021 13:08:38 -0700 > Kalesh Singh <kaleshsingh@xxxxxxxxxx> wrote: > > > == Results == > > > > Divisor is a power of 2 (divisor == 32): > > > > test_hist_field_div_not_optimized | 8,717,091 cpu-cycles > > test_hist_field_div_optimized | 1,643,137 cpu-cycles > > > > If the divisor is a power of 2, the optimized version is ~5.3x faster. > > > > Divisor is not a power of 2 (divisor == 33): > > > > test_hist_field_div_not_optimized | 4,444,324 cpu-cycles > > test_hist_field_div_optimized | 5,497,958 cpu-cycles > > To optimize this even more, if the divisor is constant, we could make a > separate function to not do the branch, and just shift or divide. Ack. I can update to use separate functions for the constant divisors. > > And even if it is not a power of 2, for constants, we could implement a > multiplication and shift, and guarantee an accuracy up to a defined max. > > > If div is a constant, then we can calculate the mult and shift, and max > dividend. Let's use 20 for shift. > > // This works best for small divisors > if (div > max_div) { > // only do a real division > return; > } > shift = 20; > mult = ((1 << shift) + div - 1) / div; > delta = mult * div - (1 << shift); > if (!delta) { > /* div is a power of 2 */ > max = -1; > return; > } > max = (1 << shift) / delta; I'm still trying to digest the above algorithm. But doesn't this add 2 extra divisions? What am I missing here? Thanks, Kalesh > > We would of course need to use 64 bit operations (maybe only do this for 64 > bit machines). And perhaps even use bigger shift values to get a bigger max. > > Then we could do: > > if (val1 < max) > return (val1 * mult) >> shift; > > -- Steve