New dependency: Orc

arun.raghavan@xxxxxxxxxxxxxxx (Arun Raghavan) · Thu, 28 Oct 2010 11:41:20 +0100

On Thu, 2010-10-28 at 01:47 +0100, Arun Raghavan wrote:
> On Wed, 2010-10-27 at 15:14 -0500, pl bossart wrote:
> > > I've been doing some work optimising the software volume scaling code,
> > > and along with my previous changes to decrease the maximum volume to
> > > 2^31-1, there seems to be a pretty good performance increase (almost 2x
> > > on my Core2 processor).
> > 
> > Are you saying you have a 2x performance gain over sse assembly? That
> > would most likely mean we need to fix the assembly for x86 and have an
> > even better performance than with orc and its intermediate step of
> > SIMD code generation...
> 
> That is what I got even when I replaced the 32x16-bit volume
> multiplication code with the same logic that I'm using in Orc. I don't

I forgot to mention that even the Orc MMX backend provides the same kind
of perf gain over the current hand-rolled code (I didn't try to rewrite
that like I did the SSE).

Also, we don't have any NEON optimisations for the s/w volume stuff in
PA, so the Orc NEON backend might be interesting to try there. I don't
know if there's a SIMD version of the ARM 32x16-bit mul, but if there
is, it's possible to get Orc to use that as well.

Cheers,
Arun