Hey there!
Just realized that gcc is missing this optimization but only in certain
cases. I built a small program to illustrate and tested it using gcc 6.4
and 7.2.
#include <stdio.h>
union u_test {
struct s_a {
int field1, field2;
} a;
struct s_b {
int field3, field4;
} b;
struct s_c {
int field5, field6;
} c;
};
union u_test *example;
int main(int argc, char **argv) {
const int *ptr;
switch (argc) {
case 0: ptr = &example->a.field2; break;
case 1: ptr = &example->b.field4; break;
case 2: ptr = &example->c.field6; break;
default: return 0;
};
printf("%d\n", *ptr);
}
0000000000400400 <main>:
400400: 83 ff 01 cmp $0x1,%edi
400403: 74 13 je 400418 <main+0x18>
400405: 83 ff 02 cmp $0x2,%edi
400408: 74 0e je 400418 <main+0x18>
40040a: 85 ff test %edi,%edi
40040c: 74 0a je 400418 <main+0x18>
40040e: 31 c0 xor %eax,%eax
400410: c3 retq
400411: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
400418: 48 8b 05 11 0c 20 00 mov 0x200c11(%rip),%rax
40041f: bf b0 05 40 00 mov $0x4005b0,%edi
400424: 8b 70 04 mov 0x4(%rax),%esi
400427: 48 83 ec 08 sub $0x8,%rsp
40042b: 31 c0 xor %eax,%eax
40042d: e8 be ff ff ff callq 4003f0 <printf@plt>
400432: 31 c0 xor %eax,%eax
400434: 48 83 c4 08 add $0x8,%rsp
400438: c3 retq
400439: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
As you can see it generates lots of cmp+jmp even when the jump target is
the same address. Since I know gcc optimizes switches that have more
than N cases, I tested the case where the switch has 5 cases + default,
by adding more structs and cases and I got a shorted code, but it just
uses a jump table which has identical entries:
0000000000400400 <main>:
400400: 83 ff 04 cmp $0x4,%edi
400403: 77 2e ja 400433 <main+0x33>
400405: 48 83 ec 08 sub $0x8,%rsp
400409: 48 8b 05 20 0c 20 00 mov 0x200c20(%rip),%rax
400410: 89 ff mov %edi,%edi
400412: 8b 70 04 mov 0x4(%rax),%esi
400415: ff 24 fd b8 05 40 00 jmpq *0x4005b8(,%rdi,8)
40041c: 0f 1f 40 00 nopl 0x0(%rax)
400420: bf b0 05 40 00 mov $0x4005b0,%edi
400425: 31 c0 xor %eax,%eax
400427: e8 c4 ff ff ff callq 4003f0 <printf@plt>
40042c: 31 c0 xor %eax,%eax
40042e: 48 83 c4 08 add $0x8,%rsp
400432: c3 retq
400433: 31 c0 xor %eax,%eax
400435: c3 retq
400436: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
I tested it on other arches such as MIPS and it happens, both in O2 and
O3. I guess there's no optimization to describe when two cases are
identical at this level? (I imagine at some high level they are
different since they generate pointers to different things, but there
could be a pass that checks whether the assembly generated is identical,
or at least in the degenerated case when the table entries are identical).
In case I use a bunch of ifs (or a cascaded ? : sequence) it detects
that all the resulting values are the same and therefore there's no need
to add jumps, or even in the case where I use the last "else" to assign
some null value it optimizes on that on subsequent code. It's
interesting to see how switch/case is a special thing and it's never
optimized as if it were a bunch of if/else statements.
Thanks!
David