On Mon, Jul 19, 2021 at 07:47:19AM +1200, Michael Schmitz wrote: > Somewhere in entry.S is > > addql #8,%sp > addql #4,%sp > > - is that faster than > > lea 12(%sp),%sp ? On the 68040 the timing can depend on the other instructions around it. Each of those addql instructions is listed as 1 and 1 for fetch/execute, while that lea is listed as 2 and 1L+1 meaning that it could potentially be faster depending on the behavior of the instruction that preceded it thorough the execute stage. That one free cycle if the stage is busy (due to the 1L) could make it effectively faster since the first addql would have to wait that extra cycle in that case. On the 68060, it looks like the lea version is the clear winner, although the timing description is obviously much more complicated and thus I might have missed something. From a quick look, it seems that lea takes the same time as just the first addql. On CPU32, the lea version loses due to the extra 3 cycles from the addressing mode, even though the base cycles of lea are the same as for addql (2 cycles each). The lea might be even worse if it can't take advantage of overlapping the surrounding instructions (1 cycle before and 1 after). Those are the only ones I already have the documentation in my hands. I haven't checked older classic cores or coldfire, but it does seem like it is specific to each chip which is faster. Obviously both versions would be the same size (2 words). Brad Boyer flar@xxxxxxxxxxxxx