reproducing PA performance results

pmeerw@xxxxxxxxxx (Peter Meerwald) · Thu, 5 Apr 2012 10:51:41 +0200 (CEST)

Hello,

looking at Arun's PA vs. AudioFlinger comparison [1], I'm wondering how to 
test PA performance in a reproducible and reliable way

my scope of interest is assessing the ARM NEON patches I submitted [3]; so 
far, there are some micro-benchmarks comparing the different 
implementations (plain C vs. C with NEON intrinsics), but they don't tell 
their impact on the whole system's performance

Arun is testing on OMAP4460, my platform is OMAP3730

I'll start with some questions on the test procedure:

clock has been reduced to 350MHz in Arun's tests (presumably to make the 
differences more measurable) -- how is the clock reduced?

Arun measures with top; I observe top output to fluctuate widely -- how do 
you read the output, average the results? how is top started (eg. top -d 
1)?

was the audio data stereo or mono? what does the hardware support?

what tool was used for playback? (Arun mentions async API but not more 
info); what is wrong with pacat and specifying the particular options used?

how is the Speex resampler used? float or fixed? quality?

how is the PA daemon configured?
realtime? priorities? shm yes/no?

I am not happy with this procedure of testing; anyway, here are some 
results comparing 44.1 and 48 kHz stereo playback on OMAP3730 
(beagleboard-xm @ 900MHz via mpurate kernel parameter); pulseaudio 
(1.99.1) is started with --system and compiled with gcc 4.6.3 and -O2 
-march=armv7-a -ffast-math -fPIC -mfloat-abi=softfp -mfpu=neon
I am forcing PA to default-sample-rate = 48000 and alternate-sample-rate = 
48000 (PA fails after idle with alternate-sample-rate=41000)
Speex is patched with [2]

48KHz stereo playback takes < 1% CPU

this is just PA/ALSA overhead

44KHz stereo playback takes ~ 3% CPU (Speex float-3 resampler w/NEON, PA 
with NEON)

here we have 44KHz->48KHz resampling, and sint16->float32 / float32->sint16 conversion

44KHz stereo playback takes ~ 5% CPU (Speex float-3 resampler w/NEON, PA 
without NEON)

here we have 44KHz->48KHz resampling, and sint16->float32 / float32->sint16 conversion
NEON optimization of the sample format conversion pays off

the Speex fixed-3 resampler makes more sense and is probably a bit more 
efficient; it saves the sint16->float32 / float32->sint16 conversion

I am measuaring 
pacat 48KHz.wav vs. pacat 44KHz.wav

observations:

I am seeing memory and CPU consumption to slightly increase (in top) when 
playing a stream -- need to investigate further

does shm make a difference?
does --readtime or priorities make a difference?

is fixed or float Speex NEON resampler faster? hard to tell...

latency vs. CPU?

how to get better performance reading?
profiling is not so easy to set up and operate

Arun reported that some of the NEON code might actually be slower on 
OMAP4 and/or using hardfp, let's see

conclusions:
resampling audio is not free (consistent with Arun's result)
observing top output involves too much guessing 

regards, p.

[1] http://arunraghavan.net/2012/01/pulseaudio-vs-audioflinger-fight/
[2] https://blueprints.launchpad.net/linaro-multimedia-speex/+spec/linaro-mmwg-speex-neon-update
[3] http://permalink.gmane.org/gmane.comp.audio.pulseaudio.general/12574

-- 

Peter Meerwald
+43-664-2444418 (mobile)