On Wed, Oct 17, 2012 at 5:35 PM, Christoffer Dall <c.dall@xxxxxxxxxxxxxxxxxxxxxx> wrote:
also, I'm not sure this will be a cycle in difference, the add takes
place anyway, just as part of the other instruction, so my knowledge
of the cpu internals comes to a short here. But it's really splitting
hairs.
Actually, both ldrd commands with and without immediate offset will take the same cycle counts to execute.
However, it seems like Cortex-A9 (and I think A15 too) has an optimized path from a load instruction to a subsequent data processing instruction,
saving 1 cycle on the load-use penalty. So in the end (ldr+add+ldrd) should take same cycle count as (ldr+ldrd[#imm]).
Anyway, sorry for a bit stupid questions.
However, it seems like Cortex-A9 (and I think A15 too) has an optimized path from a load instruction to a subsequent data processing instruction,
saving 1 cycle on the load-use penalty. So in the end (ldr+add+ldrd) should take same cycle count as (ldr+ldrd[#imm]).
Anyway, sorry for a bit stupid questions.
_______________________________________________ kvmarm mailing list kvmarm@xxxxxxxxxxxxxxxxxxxxx https://lists.cs.columbia.edu/cucslists/listinfo/kvmarm