Re: simple optimisation question

LIU Hao via Gcc-help <gcc-help@xxxxxxxxxxx> · Wed, 10 Apr 2024 09:26:11 +0800

在 2024-04-10 01:26, zamfofex 写道:
The flags I tested were ‘-O3’ vs. ‘-Oz’ and ‘-m32’ vs. none. (Four combinations per compiler.)

In GCC, the assembly code, although different, under ‘-m32 -Oz’ was of the same size (in bytes, after assembled) for both functions. For ‘-Oz’ withough ‘-m32’, the first one was larger.

The first piece of code involves two sign-extension operations, as in

   char* p = (char*) x;
   return *(x + (ptrdiff_t) i * 48 + (ptrdiff_t) j * 4);

and the second one involves one one, as in

   char* p = (char*) x;
   return *(x + (ptrdiff_t) (i * 48 + j * 4));

For -m32 the assembly differs a little, but as far as I can tell there is almost no difference.

Is this a missed size optimisation for x86-64? Even in the case where the assembly code is larger, the time performance difference seems unobservable. (Though I’d have imagined the the larger one would have been slower in each case.)

Maybe. My suggestion is to avoid `int` as subscripts for x86-64, as it involves unnecessary 
sign-extensions.

And it's not always the case that larger ones are slower. Intel CPUs recognize a lot of patterns to 
break dependencies (such as `xor eax, eax`, and similarly `xorps xmm0, xmm0`), which may make larger 
code faster.

--
Best regards,
LIU Hao

Attachment:
OpenPGP_signature.asc

Description: OpenPGP digital signature