On 11/29/2017 03:22 AM, Marc Glisse wrote: > On Wed, 29 Nov 2017, Mason wrote: > >> Hello, >> >> Consider the following code: >> >> #include <limits.h> >> >> #if CHAR_BIT != 8 >> #error "Only octets (8-bit bytes) are supported" >> #endif >> >> static const char map[16] = "xyz"; >> >> void foo(unsigned char *src, char *buf) >> { >> int hi = src[0] / 16; >> int lo = src[0] % 16; >> buf[0] = map[hi]; >> buf[1] = map[lo]; >> } >> >> >> $ gcc-7 -O2 -S -march=haswell -Wall -Wextra foo.c >> >> foo: >> movzbl (%rdi), %eax # eax = src[0] >> movl %eax, %edx # edx = src[0] >> andl $15, %eax # eax = src[0] % 16 = lo >> shrb $4, %dl # edx = src[0] / 16 = hi >> movzbl map(%rax), %eax # eax = map[lo] >> andl $15, %edx # Unnecessary, see discussion below >> movzbl map(%rdx), %edx >> movb %al, 1(%rsi) >> movb %dl, (%rsi) >> ret >> >> >> Since CHAR_BIT is 8, it is guaranteed that 0 <= src[0] < 256 >> therefore 0 <= hi < 16 >> therefore "andl $15, %edx" is a no-op and could be removed. >> >> >> In fact, if I change the program to: >> >> #include <limits.h> >> >> #if CHAR_BIT != 8 >> #error "Only octets (8-bit bytes) are supported" >> #endif >> >> static const char map[16] = "xyz"; >> >> void foo(unsigned char *src, char *buf) >> { >> int hi = src[0] / 16; >> int lo = src[0] % 16; >> buf[0] = hi; >> buf[1] = map[lo]; >> } >> >> Then the generated code is >> >> foo: >> movzbl (%rdi), %eax >> movl %eax, %edx >> andl $15, %eax >> movzbl map(%rax), %eax >> shrb $4, %dl >> movb %dl, (%rsi) >> movb %al, 1(%rsi) >> ret >> >> >> In that case, GCC doesn't emit the unnecessary instruction. >> >> Is this easily fixable? (FWIW, trunk on godbolt has the same behavior.) > > Writing > > long hi = src[0]; > hi /= 16; > > also helps, the extra &15 comes from extending from unsigned char to > long. I don't know how easy/hard it is to improve, but I am sure it is > more likely to happen if you file an issue in bugzilla ;-) Certainly more likely to get addressed if a bug is filed ;-) This is actually an RTL issue AFAICT. So it's going to be tougher to fix -- lack of range information and the dataflow is going to make it tough for the combiner to do anything. Jeff