On Mon, 2013-02-04 at 01:30 +0100, Peter Meerwald wrote: > From: Peter Meerwald <p.meerwald at bct-electronic.com> > > use (1<<15) instead of 0x7fff as a factor when converting from s16 to float32 > use (1<<31) instead of 0x7fffffff as a factor when converting from s32 to float32 > > the change is motivated by the following desireable properties: > * s16_from_f32(f32_from_s16(x)) == x for all possible s16 values > * x / (1.0f << 15) == x * (1.0f / (1 << 15)) for all x in s16 > > above changes enable easier optimization while guaranteeing bit-exact results > > further, other audio sample conversion code (libavresample) does it the same way > > v3 (comments Tanu): > * fix saturation in pa_sconv_s16le_from_f32ne_neon(), use vqrshrn > v2 (comments Tanu): > * fix comments in ARM NEON code > * use llrintf() in pa_sconv_s32le_from_float32ne() > > Signed-off-by: Peter Meerwald <p.meerwald at bct-electronic.com> > Cc: Tanu Kaskinen <tanuk at iki.fi> > --- > src/pulsecore/sconv-s16le.c | 84 ++++++++++++++++++++------------------------- > src/pulsecore/sconv_neon.c | 25 ++++++-------- > src/pulsecore/sconv_sse.c | 66 +++++++++++++++++------------------ > 3 files changed, 79 insertions(+), 96 deletions(-) Thanks! Applied. A general question about the SSE code: why does it process 8 floats at a time instead of 4? That causes code duplication. I guess it's more efficient to avoid looping as much as possible, so is there room for further improvement by processing 12, 16 or more (as much as the number of available registers allows) samples at a time? If so, I could file a wishlist bug. Should the NEON code process more samples at a time too? Another question (applies also to the old code): shouldn't the pa_sconv_s16le_from_f32ne_sse2() assembly code end with the "emms" instruction too like pa_sconv_s16le_from_f32ne_sse() does? -- Tanu