On Wed, 2010-10-27 at 15:14 -0500, pl bossart wrote: > > I've been doing some work optimising the software volume scaling code, > > and along with my previous changes to decrease the maximum volume to > > 2^31-1, there seems to be a pretty good performance increase (almost 2x > > on my Core2 processor). > > Are you saying you have a 2x performance gain over sse assembly? That > would most likely mean we need to fix the assembly for x86 and have an > even better performance than with orc and its intermediate step of > SIMD code generation... That is what I got even when I replaced the 32x16-bit volume multiplication code with the same logic that I'm using in Orc. I don't claim to be any good with SSE/MMX-fu, so it's likely we can do better with hand-rolled code in most cases. The SIMD code-generation happens on the first call (and when Orc supports it, will only happen at init time), so that should relly not be a concern. Howeve, IMO, it makes sense to switch if the performance gain of the hand-rolled code isn't very significant (because the Orc code really is far more maintainable), and with time Orc should be able to do a better job of worrying about minor differences between the various x86/x86_64 architectures, instruction scheduling differences, etc. -- Arun