mini svolume vectorization framework

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The goal was speed speed speed and DRY as much as possible with a touch 
of robustness to odd configurations.  This code uses intrinsics to do 
the SIMD stuff. Build time dependency on boost. It should (I hope) be 
comparable if not faster than the orc stuff.  Readability is arguable 
and I should mention I got the ideas for some of the things I did from 
Eigen (the template library for linear algebra).  Unfortinately given 
the need for saturating multiplies, eigen itself was unsuitable for 
integral types in the volume code.

Inside is a basic tested version for 16bit SSE2 svolume mixing, it is 
only integrated inside the testing routine in svolume_sse.c.  float 
support was also added but is untested.  neon code was also added but is 
untested (I don't have an arm machine to test on).  a non-vectorized 
implementation was also included (yet again untested).  So why submit 
the patch now?  To get some feedback from others - ie here's what things 
look like and perform, shall we carry forward?

This also lead to the discovery of a sort of bug in the reference 
implementation and others using its same technique:
154: 7fff != 5028 (0012 * 4740b0d)
936: 7fff != 1f2c (0007 * 4740b0d)
This is from the signed short result checking code in said testing 
routine  from which my results differed from the current c reference 
implementation.  The lhs is my results where as the rhs is from the 
reference.  Clearly the reference implementation is not performing a 
saturating multiply in all cases though these are some big volume 
numbers one probably wont' see in practice.  Still, confused me for a 
while when I first started working on this code and that big number is a 
valid volume inside the scope of these functions (int32).

-Jason
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-mini-vectorization-framework-for-svolume-utilizing-C++.patch
Type: text/x-patch
Size: 21233 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/pulseaudio-discuss/attachments/20110403/76f9898a/attachment.bin>


[Index of Archives]     [Linux Audio Users]     [AMD Graphics]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux