Hello, > Surprise! I'm reviewing this now. :p indeed :) > 1. v3 drops intrinsics in favour of inline asm -- is that for > performance reasons? I noticed performance issues with certain compiler versions; inline asm offers more control/defined output; further, alignment annotations are not available with intrinsics -- currently they are not used because I'm not sure about the alignment guarantees of certain PA buffers; intrinsics could probably be added later if there is enough interest > 2. In the mono->stereo float case, the Cortex A9 code is actually > slower. I recall that in a previous thread, we had this sort of > situation on one of Panda/Beagleboard. Do we need some way to pick and > choose implementations? I only have beagleboard-xm and pandabaord available as test platforms (Cortax A8 and A9, resp.) PATCH 2/6 now tests for A8 vs A9/A15/Axxx and chooses code accordingly another issue is benchmarking: relative performance is different depending on the length of the buffers processed, whether they are cached my target task involves stereo recording, resampling, int/float conversion, stereo-to-mono and mono-to-stereo mapping and I am seeing good speedups on both beagle- and pandaboard I need to check the downmix to mono behaviour after ff4af902cf4ac07c5f1da3b6dacbb3195c7c222d resampler: Fix volume on downmix to mono > 3. How shall we go about enabling this code? Have a configure time check > for some instructions that are needed, build it in if available, and > then run-time detection should pick the right code path? I'd suggest to model after bluetooth/sbc: compile the *_neon.c files always but only activate the NEON code if defined(__ARM_NEON__) disadvantage is that we cannot have a common executable for NEON/non-NEON ARM CPUs -- I don't think this is a big constraint Remi Denis-Courmont suggests to use .s assembler files to overcome this issue; this would necessitate some configure options as well interestingly, on x86/AMD64 gcc can emit MMX/SSE code in inline asm even when the compiler itself is not enabled to generate such instructions -- hence no .s files in PA so far at runtime there already is an env. var PULSE_NO_SIMD to disable optimized code path; further the output of /proc/cpuinfo is parsed to see if NEON is available (kind of pointless since it is a compile-time decision) > I'll take a closer look at things, run some tests, and start pushing > this work. I'll also be moving all the test code to src/tests/cpu-test.c > where the x86 tests have been consolidated, so running tests on > different boards should become a lot less painful. thank you for the effort; let me know if there are questions! tests are not straightforward in some cases as the actual implementation is not exported orc is broken on NEON, the loadpq is not supported thanks, p. -- Peter Meerwald +43-664-2444418 (mobile)