ARM NEON patches

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Le vendredi 10 f?vrier 2012 19:13:38 Peter Meerwald, vous avez ?crit :
> > "-mfpu=neon", then the compiler assumes the code will run ONLY on
> > NEON-capable ARM devices. If you want to do run-time detection, you MUST
> > NOT pass the corresponding compiler flag. (The same is true of MMX and
> > SSE by the way.)
> 
> so the simple solution is to
> - drop the runtime check
> - use NEON if the compiler provides NEON
> 
> if the code has to run on non-NEON platforms, NEON support cannot be
> enabled in the compiler

Correct.

> the more involved solution is to
> - have the runtime check in place
> - compile code with different compiler flags
> - make a decision at runtime and call different code path

Right.

It's much easier for dedicated assembler source code that inline assembler or 
intrinsics, as you can simply override the FPU in the source:

	.fpu neon

In GCC, the "target" function attribute would address this problem nicely, but 
it is not supported on ARM -only x86- at the moment, to my knowledge :-(

> in PulseAudio, the MMX/SSE code path use inline assembler; surprisingly
> (for me at least), gcc happily compiles inline assembler SSE/MMX code even
> with -march=i386, i.e. arch=i386 does not get passed to the assembler

I think that is a grand-fathered bug in x86 GCC. But even then, the compiler 
will reject MMX or SSE registers in the clobber list of the inline assembler. 
Without a valid clobber list, you cannot safely write inline assembler. As 
long as the MMX and SSE registers are not used for anything else in the same 
thread, it works. Then one day, someone compiles the software with SSE for FPU 
computations or whatever, and it explodes due to registers corruption.

So from my point of view, run-time MMX and SSE selection is just as hard as 
ARM NEON's. The only extra nicety is GCC 4.4 per-function target attribute:

__attribute__((__target__("mmx")))
__attribute__((__target__("sse")))
 
which enable MMX or SSE on a per-function basis. Then you can include the mm 
or xmm registers in the clobber list. So there's no need to fiddle with 
compiler flags.

> PulseAudio simply assumes that the compiler is recent enough to know about
> MMX/SSE, there is no compile-time probing or checks such as #ifdef
> __SSE__ (fair enough)

That's probably wrong. In VLC, we have had cases of corrupted builds depending 
on the compiler flags, while doing making that assumption.

> to take this solution, some build infrastructure is needed; it might be
> required as well for the SSE3 resampler patches in discussion
> 
> this means:
> - probe compiler flags (such as -msse2, -mfpu=neon)
> - probably configure options to override
> - passing different compiler flags to different compilation units
> 
> which route shall we go?


-- 
R?mi Denis-Courmont
http://www.remlab.net/
http://fi.linkedin.com/in/remidenis


[Index of Archives]     [Linux Audio Users]     [AMD Graphics]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux