On 5/19/2012 5:41 PM, Georg GCC user wrote:
On Sat, May 19, 2012 at 11:57 PM, Tim Prince<n8tm@xxxxxxx> wrote:
So, is there any way to make a 32 bit GCC emit mulpd and
the like automatically (and correctly), without explicitly
calling built-ins, like there is when everything is 64 bits?
(Maybe: Does the vectorizer, if applicable, recognize that
it may assume presence of suitable hardware and act
accordingly?)
VECTOR f(double a, double b, double c, double d)
{
return (VECTOR) { a OP c, b OP d };
}
(VECTOR is an array of two 64 bit FPT components,
in all GCC languages I have tried. It works nicely, as said,
with 64 bit compilers.)
Did you consider a suitable -march setting?
I tried dropping -march, and using -march=native, and -march=core2
on this machine. -march=corei7 on another. Irrespective of the switches,
there will be working executables. But they have two xxxsd instructions,
which is unlike the single xxxpd the 64 bit compiler produces
in the 64 bit OS/GCC environment. The programs run on the same
physical processor. So the translated program does use SSE registers,
but the code does not exercise the 128 bit pairs of doubles, only the
64 bit doubles. This creates a huge difference in speed (up to x2).
Hence my question: Does a 32 bit GCC, unlike a 64 bit GCC,
require explicitly calling intrinsics in the source program when
the physical CPU is 64 bit? Or is there a way to cater to the
vectorization circuits of a 32 bit GCC so that is will emit xxxpd?
(The GCCs in the 32 bit software environments given won't accept -m64.)
Georg
Georg
Auto-vectorization should be enabled, as for x86_64, by setting a
suitable -march and by setting -O3 or -O2 -ftree-vectorize. The
additional option -ftree-vectorizer-verbose=2 should give information as
to why recognized loops aren't vectorized.
--
Tim Prince