New dependency: Orc

arun.raghavan@xxxxxxxxxxxxxxx (Arun Raghavan) · Thu, 28 Oct 2010 01:47:11 +0100

On Wed, 2010-10-27 at 15:14 -0500, pl bossart wrote:
> > I've been doing some work optimising the software volume scaling code,
> > and along with my previous changes to decrease the maximum volume to
> > 2^31-1, there seems to be a pretty good performance increase (almost 2x
> > on my Core2 processor).
> 
> Are you saying you have a 2x performance gain over sse assembly? That
> would most likely mean we need to fix the assembly for x86 and have an
> even better performance than with orc and its intermediate step of
> SIMD code generation...

That is what I got even when I replaced the 32x16-bit volume
multiplication code with the same logic that I'm using in Orc. I don't
claim to be any good with SSE/MMX-fu, so it's likely we can do better
with hand-rolled code in most cases. The SIMD code-generation happens on
the first call (and when Orc supports it, will only happen at init
time), so that should relly not be a concern.

Howeve, IMO, it makes sense to switch if the performance gain of the
hand-rolled code isn't very significant (because the Orc code really is
far more maintainable), and with time Orc should be able to do a better
job of worrying about minor differences between the various x86/x86_64
architectures, instruction scheduling differences, etc.

-- Arun