Hi folks, I've been doing some work optimising the software volume scaling code, and along with my previous changes to decrease the maximum volume to 2^31-1, there seems to be a pretty good performance increase (almost 2x on my Core2 processor). The actual optimisations have been written in Orc[1], which is a language to write simple "functions" that get translated to SIMD instructions at runtime. I should have sent this out a while back, since we're actually using Orc for one of the echo-cancellation modules that was merged to master, but now that there could be core code using this, I thought I'd get more thoughts on making Orc an optional dependency of PulseAudio. The way I've written things right now, the old C and hand-rolled assembly is still there. Only when Orc support is enabled, and we're on a CPU where the Orc code is known to be faster, we use the Orc code. I've only written the mono and stereo S16NE functions so far, so for other formats, the old code is used. If you don't have or don't want to use Orc, it can be disabled at configure time (--disable-orc). If you do enable it, there are a couple of generated files generate for each Orc source program. These actually even contain C fallback for when the system you're on doesn't have Orc or that Orc doesn't have a backend for. At some point, if the fallback C code and the Orc functions become good enough to replace everything else, we can look at just using these to replace all the other implementations. That day isn't today, though. :) The code is at: http://git.collabora.co.uk/?p=user/arun/pulseaudio.git - there are also some fixes to the various volume scaling test code. Comments/brickbats solicited :) Cheers, Arun [1] http://code.entropywave.com/projects/orc/ p.s.: I've not tried out the Orc code on ARM (with or without NEON), so if anyone wants to give that a whirl before I get around to it, please do posts the results here. The hand-rolled should be faster for now since it uses a single instruction for the multiply+shift operation.