Nicolas Pitre <nico@xxxxxxxxxxx> writes: > On Thu, 26 Nov 2015, Måns Rullgård wrote: > >> Nicolas Pitre <nico@xxxxxxxxxxx> writes: >> >> > 3) In fact I was wondering if the overhead of the branch and back is >> > really significant compared to the non trivial cost of a idiv >> > instruction and all the complex infrastructure required to patch >> > those branches directly, and consequently if the performance >> > difference is actually worth it versus simply doing (2) alone. >> >> Depending on the operands, the div instruction can take as few as 3 >> cycles on a Cortex-A7. > > Even the current software based implementation can produce a result with > about 5 simple ALU instructions depending on the operands. > > The average cycle count is more important than the easy-way-out case. > And then how significant the two branches around it are compared to idiv > alone from direct patching of every call to it. If not calling the function saves an I-cache miss, the benefit can be substantial. No, I have no proof of this being a problem, but it's something that could happen. Of course, none of this is going to be as good as letting the compiler generate div instructions directly. -- Måns Rullgård mans@xxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html