The goal was speed speed speed and DRY as much as possible with a touch of robustness to odd configurations. This code uses intrinsics to do the SIMD stuff. Build time dependency on boost. It should (I hope) be comparable if not faster than the orc stuff. Readability is arguable and I should mention I got the ideas for some of the things I did from Eigen (the template library for linear algebra). Unfortinately given the need for saturating multiplies, eigen itself was unsuitable for integral types in the volume code. Inside is a basic tested version for 16bit SSE2 svolume mixing, it is only integrated inside the testing routine in svolume_sse.c. float support was also added but is untested. neon code was also added but is untested (I don't have an arm machine to test on). a non-vectorized implementation was also included (yet again untested). So why submit the patch now? To get some feedback from others - ie here's what things look like and perform, shall we carry forward? This also lead to the discovery of a sort of bug in the reference implementation and others using its same technique: 154: 7fff != 5028 (0012 * 4740b0d) 936: 7fff != 1f2c (0007 * 4740b0d) This is from the signed short result checking code in said testing routine from which my results differed from the current c reference implementation. The lhs is my results where as the rhs is from the reference. Clearly the reference implementation is not performing a saturating multiply in all cases though these are some big volume numbers one probably wont' see in practice. Still, confused me for a while when I first started working on this code and that big number is a valid volume inside the scope of these functions (int32). -Jason -------------- next part -------------- A non-text attachment was scrubbed... Name: 0001-mini-vectorization-framework-for-svolume-utilizing-C++.patch Type: text/x-patch Size: 21233 bytes Desc: not available URL: <http://lists.freedesktop.org/archives/pulseaudio-discuss/attachments/20110403/76f9898a/attachment.bin>