[PATCH 2/2] core: Add ARM NEON optimized sample conversion code

arun.raghavan@xxxxxxxxxxxxxxx (Arun Raghavan) · Thu, 25 Oct 2012 15:09:55 +0530

On Thu, 2012-10-25 at 11:19 +0200, Peter Meerwald wrote:
> Hello Arun,
> 
> > I was poking around this a bit. An input of 0x3f4aaa95 after the
> > multiplication with 32767.0 should result in 0x46caa8ff but tuns out to
> > be 0x46caa900. Still trying to figure out why.
> 
> I cannot follow your example, it always results in 0x46caa900 (using NEON 
> or not)

(because I find it a bit easier to show the reasoning, this is from gdb
and not a C program)

(gdb) call malloc(4)
$1 = (void *) 0x61c010
(gdb) call malloc(4)
$2 = (void *) 0x61c030
(gdb) call malloc(4)
$3 = (void *) 0x61c050
(gdb) call *(int*)$1 = 0x3f4aaa95
$4 = 1061857941
(gdb) call *(float*)$2 = 32767.0
$5 = 32767
(gdb) call *(float*)$3 = *(float*)$1 * *(float*)$2
$6 = 25940.498
(gdb) p /x *(int*)$3
$7 = 0x46caa8ff

This happens on both x86 and the Pandaboard.

> but I think a have good explanation:
> 
> static void pa_sconv_s16le_to_float32ne(unsigned n, const int16_t *src, float *dst) {
[...]

Possibly we're talking about different things here -- I'm referring to
the float -> s16le conversion.

For the reverse case, it might still be worth it to take the division's
performance penalty rather than lose precision, especially if it's still
a decent performance win over the current code.

Regards,
Arun