On Wed, Apr 08, 2020 at 09:36:16AM +1000, Benjamin Herrenschmidt wrote: > On Mon, 2020-04-06 at 23:02 -0700, Tao Ren wrote: > > I ran some testing on my ast2400 and ast2500 BMC and looks like the > > for() loop runs faster than for_each_set_bit_from() loop in my > > environment. I'm not sure if something needs to be revised in my test > > code, but please kindly share your suggestions: > > > > I use get_cycles() to calculate execution time of 2 different loops, and > > ast_vhub_dev_irq() is replaced with barrier() to avoid "noise"; below > > are the results: > > > > - when downstream port number is 5 and only 1 irq bit is set, it takes > > ~30 cycles to finish for_each_set_bit() loop, and 20-25 cycles to > > finish the for() loop. > > > > - if downstream port number is 5 and all 5 bits are set, then > > for_each_set_bit() loop takes ~50 cycles and for() loop takes ~25 > > cycles. > > > > - when I increase downsteam port number to 16 and set 1 irq bit, the > > for_each_set_bit() loop takes ~30 cycles and for() loop takes 25 > > cycles. It's a little surprise to me because I thought for() loop > > would cost 60+ cycles (3 times of the value when port number is 5). > > > > - if downstream port number is 16 and all irq status bits are set, > > then for_each_set_bit() loop takes 60-70 cycles and for() loop takes > > 30+ cycles. > > I suspect the CPU doesn't have an efficient find-zero-bit primitive, > check the generated asm. In that case I would go back to the simple for > loop. > > Cheers, > Ben. _find_next_bit_le() function is defined in arch/arm/lib/findbit.S. I'm looking at the code: will run more tests and send out patch v4 with simple for loop later. Cheers, Tao