Hello Tanu, > On Sun, 2013-01-13 at 20:59 +0200, Tanu Kaskinen wrote: > > On Sun, 2013-01-13 at 14:53 +0100, Peter Meerwald wrote: > > > > > diff --git a/src/pulsecore/sconv_neon.c b/src/pulsecore/sconv_neon.c > > > > > index 6fd966d..111b56f 100644 > > > > > --- a/src/pulsecore/sconv_neon.c > > > > > +++ b/src/pulsecore/sconv_neon.c > > > > > @@ -36,16 +36,11 @@ static void pa_sconv_s16le_from_f32ne_neon(unsigned n, const float *src, int16_t > > > > > "movs %[n], %[n], lsr #2 \n\t" > > > > > "beq 2f \n\t" > > > > > > > > > > - "vdup.f32 q2, %[plusone] \n\t" > > > > > - "vneg.f32 q3, q2 \n\t" > > > > > - "vdup.f32 q4, %[scale] \n\t" > > > > > - "vdup.u32 q5, %[mask] \n\t" > > > > > + "vdup.f32 q1, %[scale] \n\t" > > > > > > > > > > "1: \n\t" > > > > > "vld1.32 {q0}, [%[src]]! \n\t" > > > > > - "vmin.f32 q0, q0, q2 \n\t" /* clamp */ > > > > > - "vmax.f32 q0, q0, q3 \n\t" > > > > > - "vmul.f32 q0, q0, q4 \n\t" /* scale */ > > > > > + "vmul.f32 q0, q0, q1 \n\t" /* scale */ > > > > > "vcvt.s32.f32 q0, q0, #16 \n\t" /* narrow */ > > > > > > > You removed clamping - what happens if there's need for clamping? (I'm > > > > not very good at reading assembly.) > > > > > > vrshrn does the narrowing int32->int16 (with saturation); the comment > > > should be moved one line down > > > > The vcvt instruction converts floating-point numbers to fixed-point > > numbers, with 16 bits in the integer part and 16 bits in the fractional > > part, so most of the interesting stuff happens already in vcvt. How does > > vcvt handle the situation where the float doesn't fit in the 16 bits > > that are reserved for the integer part? Saturation or SIGFPE, or > > something else? How is NaN handled? The reference[1] that I'm using > > doesn't say anything about this... > > > > You say that vrshrn does its thing with saturation. Since the integer > > part of the fixed-point input is already 16-bits, there's not much need > > for saturation. Only the rounding the fractional part can cause > > overflow, so do you mean that if the rounding would cause overflow, > > vrshrn uses truncation instead of rounding? (This is not specified in > > the reference either.) > > > > [1] http://infocenter.arm.com/help/topic/com.arm.doc.dui0204j/CIHFFGJG.html > You never answered these questions, and the new patch version contains > the same code. "vcvt.s32.f32 q0, q0, #16" converts four floats into four > 16.16 fixed-point numbers. What happens if the input is greater than > INT16_MAX? here is some more detail: vcvt.s32.f32 q0, q0, #16 does saturation (this is indeed not documented), so we have 16 bit integer and 16 bit fractional the following vrshrn.s32 d0, q0, #16 shifts 16 bits to the right and rounds according to the shifted-out fractional part (but does NOT saturate); this is an error, the correct instruction is vqrshrn.s32 d0, q0, #16 which does saturation and rounding I'll post a v3 the test code below converts several values: #include <stdlib.h> #include <stdio.h> #include <math.h> #ifdef __arm__ #include "arm_neon.h" #else #include "xmmintrin.h" #endif # on ARM NEON 0.500 0 -- 00008000 1 -0.500 0 -- ffff8000 0 0.300 0 -- 00004ccc 0 0.600 1 -- 00009999 1 2.500 2 -- 00028000 3 3.500 4 -- 00038000 4 32000.500 32000 -- 7d008000 32001 33000.500 33000 -- 7fffffff 32767 -33000.500 -33000 -- 80000000 -32768 32767.500 32768 -- 7fff8000 32767 all values look reasonable; note that resuls are slightly different compared to lrintf() or SSE due to different rounding: NEON always rounds up on 0.5, lrintf() round toward the nearest even integer -- so there is a maximum deviation of 1 in some rare cases int main() { float values[] = {0.5, -0.5, 0.3, 0.6, 2.5, 3.5, 32000.5, 33000.5, -33000.5, 32767.5}; int i; for (i = 0; i < sizeof(values)/sizeof(float); i++) { float f = values[i]; printf("%.3f %ld -- ", f, lrintf(f)); #ifdef __arm__ float32x4_t x = vdupq_n_f32(f); int32x4_t y = vcvtq_n_s32_f32(x, 16); int16x4_t z = vqrshrn_n_s32(y, 16); printf("%08x %d\n", vgetq_lane_s32(y, 0), vget_lane_s16(z, 0)); #else __m128 x = _mm_set_ss(f); printf("%d\n", _mm_cvt_ss2si(x)); #endif } return EXIT_SUCCESS; } thanks, regards, p. -- Peter Meerwald +43-664-2444418 (mobile)