Re: Advice about using SIMD extensions

Daniel Berlin <dberlin@xxxxxxxxxxx> · Thu, 24 Feb 2005 11:43:23 -0500

On Thu, 2005-02-24 at 13:48 +0100, Brian Budge wrote:
> Daniel -
> 
> Yeah, that's what I meant... but wouldn't optimal scheduling be nice ;)
> 
> I've been noticing this on a pentium4 (which it seemed was also what
> Richard was using).
> 
> It seems like SSE would be a pretty widely used target, and that's why
> I was surprised
> to get slowdowns on even simple vector additions/multiplies/etc...
> when mixed with other code.  If I ran very contrived examples, things
> ran very fast, but as soon as I put my library into an application, I
> noticed that things were slower, despite some things being calculated
> 4 times as fast.
> 
> It seems that you must use the intrinsics the same way that you'd
> write the assembly in order to get decent results.

You shouldn't have to.
The whole advantage of the intrinsics is that they are scheduled :).

Anyway, looking at the scheduler descriptions, i don't see the p4
including any sort of vector scheduling.

The athlon description looks like it does.
Try -mcpu=k8 and see if it is any better.

I should note that AFAIK, Intel's compiler doesn't actually do
scheduling for the pentium4 anymore, because it wasn't worth it.  Maybe
that doesn't apply to vector instructions (or maybe the person who told
me this was wrong).