What tests on AMD and ARM are needed for Orc-based optimised volume scaling?

arun.raghavan@xxxxxxxxxxxxxxx (Arun Raghavan) · Thu, 10 Mar 2011 17:59:28 +0530

Hi Paul,

On Thu, 2011-03-10 at 13:02 +0100, Paul Menzel wrote:
> Dear Arun,
> 
> 
> your commit messages of commit 4cd90d9e [1] says the following.
> 
>         ? Since I haven't been able to test on other architectures, the
>         Orc code is only used when MMX/SSE* is present. This can be
>         changed in the future after testing on AMD and ARM machines.
> 
> What tests need to be performed or what tests did you run to figure out
> that it works.

Sorry, I should've cleaned up that comment. By AMD, I meant CPUs with
3DNow! but no SSE/MMX support. I don't actually see the 3DNOW flags used
at any point other than detection, so there shouldn't be anything to
worry about here.

On ARM, I actually did a quick test and the Orc performance was
significantly worse [1]. I don't think I tested the NEON backend,
though. The test is simple - I #define RUN_TEST in each of the svolume_*
files that I want to check, bump up the number of iterations to 10000 or
more, and then load pulseaudio a few times to get a fair measurement
(the test generates N random samples and runs the scaling function on
them). If you want to try this, you'll also need to adjust the
conditional orc initialisation in src/pulsecore/cpu-orc.c.

Cheers,
Arun

[1] The hand-rolled ARM code is faster than the Orc ARM backend because
the former uses a single instruction that is available on ARM to do what
the Orc function does using multiple instructions. I'd spoken to Orc
upstream (David Schleef) about the possibility of having this scaling
operation as a basic Orc operation, so that we could generate that
instruction on ARM and fallback to the multiple instructions on other
architectures. He was amenable to the idea, but I haven't had time to
actually hack this together.