[PATCH 2/2] core: Add ARM NEON optimized sample conversion code

pmeerw@xxxxxxxxxx (Peter Meerwald) · Thu, 25 Oct 2012 11:19:17 +0200 (CEST)

Hello Arun,

> I was poking around this a bit. An input of 0x3f4aaa95 after the
> multiplication with 32767.0 should result in 0x46caa8ff but tuns out to
> be 0x46caa900. Still trying to figure out why.

I cannot follow your example, it always results in 0x46caa900 (using NEON 
or not)

but I think a have good explanation:

static void pa_sconv_s16le_to_float32ne(unsigned n, const int16_t *src, float *dst) {
    pa_assert(src);
    pa_assert(dst);

    for (; n > 0; n--)
        *(dst++) = ((float) (*(src++))) / (float) 0x7FFF;
}

is the baseline implementation; notice that we have a division here

the NEON code does the equivalent of

    const float invscale = 1.0f / 0x7FFF;
    for (; n > 0; n--)
        *(dst++) = ((float) (*(src++))) * invscale;

notice that the division is replaced by multiplication with the inverse

also these two C implementation show different results; the NEON 
implementation gives the exact results of the second C implementation

float division is prohibitive on NEON runtime-wise, hence the 
multiplication with the inverse

I think a C compiler is not allowed to make such optimization (unless one 
explicitly allows for precision loss)

regards, p.

-- 

Peter Meerwald
+43-664-2444418 (mobile)