在 2024-04-10 01:26, zamfofex 写道:
The flags I tested were ‘-O3’ vs. ‘-Oz’ and ‘-m32’ vs. none. (Four combinations per compiler.) In GCC, the assembly code, although different, under ‘-m32 -Oz’ was of the same size (in bytes, after assembled) for both functions. For ‘-Oz’ withough ‘-m32’, the first one was larger.
The first piece of code involves two sign-extension operations, as in char* p = (char*) x; return *(x + (ptrdiff_t) i * 48 + (ptrdiff_t) j * 4); and the second one involves one one, as in char* p = (char*) x; return *(x + (ptrdiff_t) (i * 48 + j * 4)); For -m32 the assembly differs a little, but as far as I can tell there is almost no difference.
Is this a missed size optimisation for x86-64? Even in the case where the assembly code is larger, the time performance difference seems unobservable. (Though I’d have imagined the the larger one would have been slower in each case.)
Maybe. My suggestion is to avoid `int` as subscripts for x86-64, as it involves unnecessary sign-extensions.
And it's not always the case that larger ones are slower. Intel CPUs recognize a lot of patterns to break dependencies (such as `xor eax, eax`, and similarly `xorps xmm0, xmm0`), which may make larger code faster.
-- Best regards, LIU Hao
Attachment:
OpenPGP_signature.asc
Description: OpenPGP digital signature