sse vector extensions, unions and inlining

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I've been messing around with the gcc vector extensions (sse) and the
assembly produced seems somewhat suboptimal. I'm not sure what
"optimal" is so I'm inquring here first, before filing a bug report.

This concerns inlined functions that return vectors using using the
struct/union return convention, that is, the address where the result
is to be stored is passed as a hidden first argument to the callee.
When the function returns a 'raw' vector type (such as "double foo
__attribute__((vector_size(16)
))") that fits in a single mmx register then the result of the call to
the inlined function is the same as manual inlining. However if a
union is returned (such as "union { double a[2]; double v
__attribute((vector_size(16))); } ") or if the vector type is too big
for a register (such as "double foo __attribute__((vector_size(32)))")
then excessive stack shuffling occurs, relative to manual inlining.

This is C, btw, so I understand that, in general, stack space has to
be reserved for the arguments (as opposed to const&) but I would
expect that after inlining, the optimizer could see that the arguments
are not modified and not bounce them through the stack, as it does for
things like int and double. Lets say there's functions f(a,b) = a+b,
g(a,b) = a*b, and h(a,b) = g(f(a,a),f(a,b)). Functions f and g are
inlined into h, but the body of h looks like this:

reserve stack space for what would have been calls to f and g
copy arguments into that space
load from that space into mmx registers
operate
copy from mmx registers into stack space
copy from stack space into the space pointed to by the hidden "return
here" argument.

If h is defined as (a*a)+(a*b) then this stack shuffling does not happen.

Is this asking too much? Is there some fundamental reason why the
arguments to the inlined function need to be bounced through the
stack? This is with gcc 4.1.3. I've attached a test file and resulting
assembly. The difference is pretty striking, though I've not
benchmarked it.  I'm also aware this is not really the best way to use
sse (better to put each vector component in a separate array and
vectorize the loop) but I think maybe the issue is with inlined
functions that return structs/unions in general.

Thanks,
Scott

Attachment: vec_test.tar.gz
Description: GNU Zip compressed data


[Index of Archives]     [Linux C Programming]     [Linux Kernel]     [eCos]     [Fedora Development]     [Fedora Announce]     [Autoconf]     [The DWARVES Debugging Tools]     [Yosemite Campsites]     [Yosemite News]     [Linux GCC]

  Powered by Linux