Hello, looking at Arun's PA vs. AudioFlinger comparison [1], I'm wondering how to test PA performance in a reproducible and reliable way my scope of interest is assessing the ARM NEON patches I submitted [3]; so far, there are some micro-benchmarks comparing the different implementations (plain C vs. C with NEON intrinsics), but they don't tell their impact on the whole system's performance Arun is testing on OMAP4460, my platform is OMAP3730 I'll start with some questions on the test procedure: clock has been reduced to 350MHz in Arun's tests (presumably to make the differences more measurable) -- how is the clock reduced? Arun measures with top; I observe top output to fluctuate widely -- how do you read the output, average the results? how is top started (eg. top -d 1)? was the audio data stereo or mono? what does the hardware support? what tool was used for playback? (Arun mentions async API but not more info); what is wrong with pacat and specifying the particular options used? how is the Speex resampler used? float or fixed? quality? how is the PA daemon configured? realtime? priorities? shm yes/no? I am not happy with this procedure of testing; anyway, here are some results comparing 44.1 and 48 kHz stereo playback on OMAP3730 (beagleboard-xm @ 900MHz via mpurate kernel parameter); pulseaudio (1.99.1) is started with --system and compiled with gcc 4.6.3 and -O2 -march=armv7-a -ffast-math -fPIC -mfloat-abi=softfp -mfpu=neon I am forcing PA to default-sample-rate = 48000 and alternate-sample-rate = 48000 (PA fails after idle with alternate-sample-rate=41000) Speex is patched with [2] 48KHz stereo playback takes < 1% CPU this is just PA/ALSA overhead 44KHz stereo playback takes ~ 3% CPU (Speex float-3 resampler w/NEON, PA with NEON) here we have 44KHz->48KHz resampling, and sint16->float32 / float32->sint16 conversion 44KHz stereo playback takes ~ 5% CPU (Speex float-3 resampler w/NEON, PA without NEON) here we have 44KHz->48KHz resampling, and sint16->float32 / float32->sint16 conversion NEON optimization of the sample format conversion pays off the Speex fixed-3 resampler makes more sense and is probably a bit more efficient; it saves the sint16->float32 / float32->sint16 conversion I am measuaring pacat 48KHz.wav vs. pacat 44KHz.wav observations: I am seeing memory and CPU consumption to slightly increase (in top) when playing a stream -- need to investigate further does shm make a difference? does --readtime or priorities make a difference? is fixed or float Speex NEON resampler faster? hard to tell... latency vs. CPU? how to get better performance reading? profiling is not so easy to set up and operate Arun reported that some of the NEON code might actually be slower on OMAP4 and/or using hardfp, let's see conclusions: resampling audio is not free (consistent with Arun's result) observing top output involves too much guessing regards, p. [1] http://arunraghavan.net/2012/01/pulseaudio-vs-audioflinger-fight/ [2] https://blueprints.launchpad.net/linaro-multimedia-speex/+spec/linaro-mmwg-speex-neon-update [3] http://permalink.gmane.org/gmane.comp.audio.pulseaudio.general/12574 -- Peter Meerwald +43-664-2444418 (mobile)