Re: [PATCH v4 0/3] m68k: Improved switch stack handling

Michael Schmitz <schmitzmic@xxxxxxxxx> · Mon, 19 Jul 2021 15:15:44 +1200

Hi Brad,

Am 19.07.2021 um 08:59 schrieb Brad Boyer:
On Mon, Jul 19, 2021 at 07:47:19AM +1200, Michael Schmitz wrote:
Somewhere in entry.S is

addql   #8,%sp
addql   #4,%sp

- is that faster than

lea     12(%sp),%sp ?

On the 68040 the timing can depend on the other instructions around
it. Each of those addql instructions is listed as 1 and 1 for
fetch/execute, while that lea is listed as 2 and 1L+1 meaning that
it could potentially be faster depending on the behavior of the
instruction that preceded it thorough the execute stage. That one
free cycle if the stage is busy (due to the 1L) could make it
effectively faster since the first addql would have to wait that
extra cycle in that case.

On the 68060, it looks like the lea version is the clear winner,
although the timing description is obviously much more complicated
and thus I might have missed something. From a quick look, it
seems that lea takes the same time as just the first addql.

On CPU32, the lea version loses due to the extra 3 cycles from
the addressing mode, even though the base cycles of lea are the
same as for addql (2 cycles each). The lea might be even worse
if it can't take advantage of overlapping the surrounding
instructions (1 cycle before and 1 after).

Those are the only ones I already have the documentation in my
hands. I haven't checked older classic cores or coldfire, but
it does seem like it is specific to each chip which is faster.

Obviously both versions would be the same size (2 words).

Thanks, best leave it as is then.

Cheers,

	Michael