I wanted to test how restrict helps in code generation. I started with
this example:
struct s { int a; int b; };
inline
void encode(int a, char* p) {
for (unsigned i = 0; i < sizeof(a); ++i) {
p[i] = reinterpret_cast<const char*>(&a)[i];
}
}
void f(s* x, char* p) {
encode(x->a, p + 0);
encode(x->b, p + 4);
}
simulating serialization code. My expectations were that without
__restrict, I'd have four instructions:
mov (%rdi), %eax
mov %eax, %(rsi)
mov 4(%rdi), %eax
mov %eax, 4(%rsi)
while x and p can alias, a and p cannot, because a is a local variable.
I further hoped that adding __restrict would remove two instructions:
mov (%rdi), %rax
mov %rax, (%rsi)
since the compiler now knows that x and p do not alias.
However, the generated code is much poorer than this (-O2):
0: 8b 07 mov (%rdi),%eax
2: 89 c1 mov %eax,%ecx
4: 88 06 mov %al,(%rsi)
6: 66 c1 e9 08 shr $0x8,%cx
a: 88 4e 01 mov %cl,0x1(%rsi)
d: 89 c1 mov %eax,%ecx
f: c1 e8 18 shr $0x18,%eax
12: c1 e9 10 shr $0x10,%ecx
15: 88 46 03 mov %al,0x3(%rsi)
18: 88 4e 02 mov %cl,0x2(%rsi)
1b: 8b 47 04 mov 0x4(%rdi),%eax
1e: 89 c7 mov %eax,%edi
20: 89 c1 mov %eax,%ecx
22: 88 46 04 mov %al,0x4(%rsi)
25: 66 c1 ef 08 shr $0x8,%di
29: c1 e9 10 shr $0x10,%ecx
2c: c1 e8 18 shr $0x18,%eax
2f: 40 88 7e 05 mov %dil,0x5(%rsi)
33: 88 4e 06 mov %cl,0x6(%rsi)
36: 88 46 07 mov %al,0x7(%rsi)
gcc doesn't even recognize the idiom of writing a word's four bytes
sequentially. With -O3, there is some improvement:
0: 8b 07 mov (%rdi),%eax
2: 89 06 mov %eax,(%rsi)
4: 8b 47 04 mov 0x4(%rdi),%eax
7: 89 c1 mov %eax,%ecx
9: 88 46 04 mov %al,0x4(%rsi)
c: 66 c1 e9 08 shr $0x8,%cx
10: 88 4e 05 mov %cl,0x5(%rsi)
13: 89 c1 mov %eax,%ecx
15: c1 e8 18 shr $0x18,%eax
18: c1 e9 10 shr $0x10,%ecx
1b: 88 46 07 mov %al,0x7(%rsi)
1e: 88 4e 06 mov %cl,0x6(%rsi)
the copy of the first word is optimized, but the second one is not, even
though they're exactly the same.
Adding __restrict did not help.
Is this a problem in gcc, or are my expectations incorrect? I'm
particularly worried that gcc recognized the copy idiom, but did not
apply it to the second word, and required -O3 to optimize it.
Using std::copy_n() helped, but __restrict did not:
0: 8b 07 mov (%rdi),%eax
2: 89 06 mov %eax,(%rsi)
4: 8b 47 04 mov 0x4(%rdi),%eax
7: 89 46 04 mov %eax,0x4(%rsi)
so the optimization opportunity is still missed.
gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6)