On 10/13/2017 04:11 AM, Lukas Wunner wrote:
[cc += William Breathitt Gray, Geert Uytterhoeven, Phil Reid, David Daney, Iban Rodriguez] The drivers gpio-104-dio-48e.c, gpio-74x164.c, gpio-gpio-mm.c, gpio-pca953x.c, gpio-thunderx.c, gpio-ws16c48.c, gpio-xilinx.c currently use an inefficient algorithm for their ->set_multiple callback: They iterate over every chip (or bank or gpio pin), check if any bits are set in the mask for this particular chip, and if so, update the affected GPIOs. If the mask is sparsely populated, you'll waste CPU time checking chips even though they're not affected by the operation at all. Iterating over the chips is backwards, it is more efficient to iterate over the bits set in the mask, identify the corresponding chip and update its affected GPIOs. The gpio-max3191x.c driver I posted yesterday contains an example for such an algorithm and you may want to improve your ->set_mutiple implementation accordingly:
Do you have any profiling results that demonstrate significant system-wide performance improvements by making such a change?
For the gpio-thunderx.c driver, the words in the bits/mask exactly match the banks, so the number of iterations doesn't change with the approach you suggest.
In fact, an argument could be made in the other direction. The increased ICache pressure from code bloat and branch prediction misses resulting from extra testing of the mask bits could easily make the system slower using your suggestion.
Thanks, David Daney -- To unsubscribe from this list: send the line "unsubscribe linux-gpio" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html