On Sun, 2013-01-13 at 14:53 +0100, Peter Meerwald wrote: > > > @@ -115,18 +115,16 @@ void pa_sconv_s16le_from_float32ne(unsigned n, const float *a, int16_t *b) { > > > #if SWAP_WORDS == 1 > > > for (; n > 0; n--) { > > > int16_t s; > > > - float v = *(a++); > > > + float v = *(a++) * (1 << 15); > > > > > > - v = PA_CLAMP_UNLIKELY(v, -1.0f, 1.f); > > > - s = (int16_t) lrintf(v * 0x7FFF); > > > + s = (int16_t) PA_CLAMP_UNLIKELY(lrintf(v), -0x8000, 0x7FFF); > > > If you call lrintf() before clamping the input, you get undefined > > results in case the input is very large. I guess this would work: > > true; but this is on purpose to avoid the explicit clamping > > the aim is to use only implicit clamping (with saturation) when narrowing > the data type from long int to int16_t -- and it turns out that x86 and NEON have > suitable instructions for that > > 'large input' would be > LONG_MAX (or < LONG_MIN) -- so there is quite a > bit of 'headroom' between normally values around 1<<15, and 1<<31 (on > 32-bit systems) or 1<<63 (on 64-bit systems) > > > s = (int16_t) lrintf(PA_CLAMP_UNLIKELY(v, (float) -0x8000, (float) 0x7FFF)); > > > The same comment applies to all "from float" conversions. > > we can of course do the explicit clamping in C and tolerate deviating > results for large inputs in optimized implementations -- some other code > is seriously broken if the input is huge floats So by "explicit clamping" you mean clamping the float input, and by "implicit clamping" you mean clamping the result of converting the input to an integer. And you want to clamp in the integer domain because that's what the assembly code does too, so you should get matching results from both the C code and the assembly code. I think that's not guaranteed, though, because lrintf() return value on bad input is unspecified, so you don't know if lrintf() does the same thing as your assembly code. It's a good point, though, that if problematic input occurs, it means that the input is garbage already, so it doesn't really matter what we do with it. It might be that PA_CLAMP_UNLIKELY is slightly more efficient if compares integers rather than floats, so maybe it's best to leave the code as it is. > > > diff --git a/src/pulsecore/sconv_neon.c b/src/pulsecore/sconv_neon.c > > > index 6fd966d..111b56f 100644 > > > --- a/src/pulsecore/sconv_neon.c > > > +++ b/src/pulsecore/sconv_neon.c > > > @@ -36,16 +36,11 @@ static void pa_sconv_s16le_from_f32ne_neon(unsigned n, const float *src, int16_t > > > "movs %[n], %[n], lsr #2 \n\t" > > > "beq 2f \n\t" > > > > > > - "vdup.f32 q2, %[plusone] \n\t" > > > - "vneg.f32 q3, q2 \n\t" > > > - "vdup.f32 q4, %[scale] \n\t" > > > - "vdup.u32 q5, %[mask] \n\t" > > > + "vdup.f32 q1, %[scale] \n\t" > > > > > > "1: \n\t" > > > "vld1.32 {q0}, [%[src]]! \n\t" > > > - "vmin.f32 q0, q0, q2 \n\t" /* clamp */ > > > - "vmax.f32 q0, q0, q3 \n\t" > > > - "vmul.f32 q0, q0, q4 \n\t" /* scale */ > > > + "vmul.f32 q0, q0, q1 \n\t" /* scale */ > > > "vcvt.s32.f32 q0, q0, #16 \n\t" /* narrow */ > > > You removed clamping - what happens if there's need for clamping? (I'm > > not very good at reading assembly.) > > vrshrn does the narrowing int32->int16 (with saturation); the comment > should be moved one line down The vcvt instruction converts floating-point numbers to fixed-point numbers, with 16 bits in the integer part and 16 bits in the fractional part, so most of the interesting stuff happens already in vcvt. How does vcvt handle the situation where the float doesn't fit in the 16 bits that are reserved for the integer part? Saturation or SIGFPE, or something else? How is NaN handled? The reference[1] that I'm using doesn't say anything about this... You say that vrshrn does its thing with saturation. Since the integer part of the fixed-point input is already 16-bits, there's not much need for saturation. Only the rounding the fractional part can cause overflow, so do you mean that if the rounding would cause overflow, vrshrn uses truncation instead of rounding? (This is not specified in the reference either.) [1] http://infocenter.arm.com/help/topic/com.arm.doc.dui0204j/CIHFFGJG.html -- Tanu