On Thu, 2012-01-12 at 17:20 +0100, Peter Meerwald wrote: > Hello, > > here is some optimized code for ARM NEON for sconv, svolume and remap > (benchmarks below for a Beagleboard-XM) Nice! Sorry it's taken so long. I've pulled the 3 misc patches right now. Will review the rest soon. > I put this up for review, although there are still some rough edges: > * there is no configure option for ARM NEON yet (there is none for > SSE/MMX as well); NEON can be disabled at runtime by defining env. > var. PULSE_NO_SIMD; ARM NEON code depends on __ARM_NEON__ #defined > by the compiler The lack of a configuration option is fine. And as I understand it, the convention in the ARM world is you compile for a given target and run only on a machine that is a superset of that target. So, unlike with MMX/SSE, not having a run-time tests is okay. > * I have no runtime comparison for the orc svolume code yet (note that > orc is not used on ARM yet, although it should be possible) The ARM version of the svolume code makes use of 'smulwb' instruction, making it faster than the Orc code since that's a decomposition of this instruction. > * I would like to be able to test the svolume/remap code against the > C reference implementation, however, those are easily available/exposed > (or I don't know how to get hold of a function pointer) On the svolume side, the implementations are initialised in increasing order of optimisation, so if you have the tests enabled for all of them, you'll get the runtime numbers of each with the previous implementation as reference. > * the runtime of the existing ARMv6 implementation of volume_s16ne() looks > very strange, has this been tested recently? Very strange -- I'll try to get one of my boards running again and take a look. -- Arun