Hi Paul, On Thu, 2011-03-10 at 13:02 +0100, Paul Menzel wrote: > Dear Arun, > > > your commit messages of commit 4cd90d9e [1] says the following. > > ? Since I haven't been able to test on other architectures, the > Orc code is only used when MMX/SSE* is present. This can be > changed in the future after testing on AMD and ARM machines. > > What tests need to be performed or what tests did you run to figure out > that it works. Sorry, I should've cleaned up that comment. By AMD, I meant CPUs with 3DNow! but no SSE/MMX support. I don't actually see the 3DNOW flags used at any point other than detection, so there shouldn't be anything to worry about here. On ARM, I actually did a quick test and the Orc performance was significantly worse [1]. I don't think I tested the NEON backend, though. The test is simple - I #define RUN_TEST in each of the svolume_* files that I want to check, bump up the number of iterations to 10000 or more, and then load pulseaudio a few times to get a fair measurement (the test generates N random samples and runs the scaling function on them). If you want to try this, you'll also need to adjust the conditional orc initialisation in src/pulsecore/cpu-orc.c. Cheers, Arun [1] The hand-rolled ARM code is faster than the Orc ARM backend because the former uses a single instruction that is available on ARM to do what the Orc function does using multiple instructions. I'd spoken to Orc upstream (David Schleef) about the possibility of having this scaling operation as a basic Orc operation, so that we could generate that instruction on ARM and fallback to the multiple instructions on other architectures. He was amenable to the idea, but I haven't had time to actually hack this together.