Re: restrict leaving byte copies unoptimized

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




Good questions, but I don't know all the answers.  The nop
instructions are there because the gcc instruction scheduler is
creating groups which are intended to be optimal for the instruction
dispatcher.  You should make sure that you are using a -mtune option
that corresponds to the processor you are using, to make sure that gcc
is doing something that is appropriate there.

Indeed changing to -mtune=G4 (or G3) gets rid of the nops which are there in the -mtune=G5 and default versions, though still giving the stalling load/store pairs, rather than the interleaved load/load/.../store/store sequence seen with the word-op code:
   ...
       lbz r0,1(r2)
       stb r0,1(r9)
       lbz r11,2(r2)
       stb r11,2(r9)
       lbz r0,3(r2)
       stb r0,3(r9)
       lbz r11,4(r2)
       stb r11,4(r9)
       lbz r0,5(r2)
       stb r0,5(r9)
   ...

gcc has gotten steadily better support for the restrict qualifier, but
it still doesn't work as well as it should.  In gcc 4.2 it did very
little.

It does make me wonder if the restrict qualifier optimizations were simply done for word operations and not for byte versions, if there are good reasons for this or if something about the G5 architecture prefers the nops in the pipeline rather than stalled load/store sequences.

Interesting stuff; thanks for the pointers,
--
 Dan


[Index of Archives]     [Linux C Programming]     [Linux Kernel]     [eCos]     [Fedora Development]     [Fedora Announce]     [Autoconf]     [The DWARVES Debugging Tools]     [Yosemite Campsites]     [Yosemite News]     [Linux GCC]

  Powered by Linux