On 29/11/2010 13:29, richardcavell@xxxxxxxx wrote:
Thanks, David. Some follow up questions:
It is typically possible, but it depends on the compile-time flags you
use and the exact formulation of the source code. Don't expect that
the
compiler will be able to read your mind here - there are all sorts of
subtleties that come into play with automatic vectorisation. gcc will
err on the side of generating definitely correct code, rather than
generating fast code that might be wrong due to things like aliasing
issues.
Is the best solution to use variable types that are
architecture-specific? For example, defining a series of variable types
that reflect every possible interpretation of the x86 XMM register file?
Or should we be more general about it and allow for arrays of type
int/float and hope that the compiler will vectorise them? There are
certain groups that are commonly used - for example, a matrix of 2x2
floating point numbers, or an array of 3 or 4 pairs of floats. Should
they have their own variable type?
I don't write much C code for the x86 - certainly not any where the
speed matters that much. So you are getting beyond my experience here.
But the best advice is to try it and see - make some examples, compile
with different options, and examine the generated assembly. Small
examples will be easier to follow, but on the other hand the compiler
has more scope for optimisation when given plenty of code.
There are "intrinsic" functions in gcc for some processors that give you
direct access to SIMD instructions. There are also special types or
attributes for variables for such code. Again, I don't know if that
applies to the x86. But in general, if you are using "normal" types and
want to give the compiler its best chance, make sure you avoid "manual
optimisation". The compiler can do far more with an array of ints than
it can with a pointer-to-int, because it knows much more about it.
gcc does not knowingly produce buggy code - therefore it cannot
produce
code that is "less buggy". But I believe it can produce alternate
pathways that are chosen at runtime according to the processor being
used - though I haven't worked with such code myself.
Well, some code that is perfectly legal for both the 68000 and 68060
chips will not work correctly on the 68060 due to errata. Certain
instruction sequences are not executed correctly even though they should
be - and one would only know if one reads the errata documents. I wonder
whether gcc avoids such instruction sequences.
It will avoid such instruction sequences if you tell it to. You can
choose the target(s) that the code should run on, and you can also
choose the target for optimisation. For example, you can choose to
generate code that will run on all 68xxx processors, but is optimised
for the 68060 - it will run correctly on them all, but will use much
less absolute addressing (which is typically a fast choice on a 68000,
but slow on a 68060).