ARM NEON optimized code

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 2012-01-09 at 12:14 +0100, Peter Meerwald wrote:
> Hello,
> 
> I am about to prepare some ARM NEON optimized code for PulseAudio; 
> attached is a stand-alone test program demonstrating 
> sconv_s16le_from_float() and sconv_s16le_to_float() on 1019 samples
> 
> questions:
> is it acceptable to use ARM NEON intrinsics?
> or is __asm__ __volatile or assembler source preferred? 
> or Orc code?

My opinion on this is that we pick the one which performs best, and when
the solutions are comparable, pick the most easily maintained (Orc,
intrinsics, inline assembly in decreasing order of maintainability

> I picked intrinsics due to simplicity... the generated code (gcc-4.6, 
> -O2) looks clean
[...]
> # ./sconv_neon 
> checking NEON sconv_s16le_from_float(2038)
> NEON: 3723 usec.
> ref: 64516 usec.
> checking NEON sconv_s16le_to_float(2038)
> NEON: 1923 usec.
> ref: 18280 usec.
> 
> runtime is for 1000 repetitions on a Beagleboard-XM (NEON vs. reference C 
> code)

That is neat!

> if it looks OK to you, I'll go ahead and submit patches to integrate with 
> PA...
> 
> regards, p.

Cheers,
Arun



[Index of Archives]     [Linux Audio Users]     [AMD Graphics]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux