Surprisingly bad code generated near char*

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I wanted to test how restrict helps in code generation. I started with this example:


  struct s { int a; int b; };

  inline
  void encode(int a, char* p) {
    for (unsigned i = 0; i < sizeof(a); ++i) {
      p[i] = reinterpret_cast<const char*>(&a)[i];
    }
  }

  void f(s* x, char* p) {
    encode(x->a, p + 0);
    encode(x->b, p + 4);
  }


simulating serialization code. My expectations were that without __restrict, I'd have four instructions:


   mov (%rdi), %eax

   mov %eax, %(rsi)

   mov 4(%rdi), %eax

   mov %eax, 4(%rsi)


while x and p can alias, a and p cannot, because a is a local variable. I further hoped that adding __restrict would remove two instructions:


   mov (%rdi), %rax

   mov %rax, (%rsi)


since the compiler now knows that x and p do not alias.


However, the generated code is much poorer than this (-O2):


   0:    8b 07                    mov    (%rdi),%eax
   2:    89 c1                    mov    %eax,%ecx
   4:    88 06                    mov    %al,(%rsi)
   6:    66 c1 e9 08              shr    $0x8,%cx
   a:    88 4e 01                 mov    %cl,0x1(%rsi)
   d:    89 c1                    mov    %eax,%ecx
   f:    c1 e8 18                 shr    $0x18,%eax
  12:    c1 e9 10                 shr    $0x10,%ecx
  15:    88 46 03                 mov    %al,0x3(%rsi)
  18:    88 4e 02                 mov    %cl,0x2(%rsi)
  1b:    8b 47 04                 mov    0x4(%rdi),%eax
  1e:    89 c7                    mov    %eax,%edi
  20:    89 c1                    mov    %eax,%ecx
  22:    88 46 04                 mov    %al,0x4(%rsi)
  25:    66 c1 ef 08              shr    $0x8,%di
  29:    c1 e9 10                 shr    $0x10,%ecx
  2c:    c1 e8 18                 shr    $0x18,%eax
  2f:    40 88 7e 05              mov    %dil,0x5(%rsi)
  33:    88 4e 06                 mov    %cl,0x6(%rsi)
  36:    88 46 07                 mov    %al,0x7(%rsi)


gcc doesn't even recognize the idiom of writing a word's four bytes sequentially. With -O3, there is some improvement:


   0:    8b 07                    mov    (%rdi),%eax
   2:    89 06                    mov    %eax,(%rsi)
   4:    8b 47 04                 mov    0x4(%rdi),%eax
   7:    89 c1                    mov    %eax,%ecx
   9:    88 46 04                 mov    %al,0x4(%rsi)
   c:    66 c1 e9 08              shr    $0x8,%cx
  10:    88 4e 05                 mov    %cl,0x5(%rsi)
  13:    89 c1                    mov    %eax,%ecx
  15:    c1 e8 18                 shr    $0x18,%eax
  18:    c1 e9 10                 shr    $0x10,%ecx
  1b:    88 46 07                 mov    %al,0x7(%rsi)
  1e:    88 4e 06                 mov    %cl,0x6(%rsi)

the copy of the first word is optimized, but the second one is not, even though they're exactly the same.


Adding __restrict did not help.


Is this a problem in gcc, or are my expectations incorrect? I'm particularly worried that gcc recognized the copy idiom, but did not apply it to the second word, and required -O3 to optimize it.


Using std::copy_n() helped, but __restrict did not:


   0:    8b 07                    mov    (%rdi),%eax
   2:    89 06                    mov    %eax,(%rsi)
   4:    8b 47 04                 mov    0x4(%rdi),%eax
   7:    89 46 04                 mov    %eax,0x4(%rsi)

so the optimization opportunity is still missed.


gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6)




[Index of Archives]     [Linux C Programming]     [Linux Kernel]     [eCos]     [Fedora Development]     [Fedora Announce]     [Autoconf]     [The DWARVES Debugging Tools]     [Yosemite Campsites]     [Yosemite News]     [Linux GCC]

  Powered by Linux