[PATCH] core: Fix a litte-endian bug in ARM svolume code

arun.raghavan@xxxxxxxxxxxxxxx (Arun Raghavan) · Tue, 23 Oct 2012 19:30:37 +0530

On Tue, 2012-10-23 at 15:48 +0200, Peter Meerwald wrote:
> Hello myself,
> 
> > comparing ARM vs. NEON code, the svolume s16 NEON code uses two MULs, 
> > while ARM can do with one -- the ARM instructions (smulwb, ssat) look 
> > ideal for the svolume_s16 code
> 
> for the records, NEON can also do it with one MUL:
> 
> static inline void vol_s16_neon(const uint32x4_t *vol4, int16_t *samples, unsigned length) {
>     asm volatile (
>     "mov        %[length], %[length], lsr #2\n\t"
>     "vld1.s32   {q1}, [%[vol]]\n\t"
>     "1:\n\t"
>     "vld1.16    {d0}, [%[samples]]\n\t"
>     "vshll.s16  q0, d0, #15\n\t"
>     "vqdmulhq.s32 q0, q0, q1\n\t"
>     "vmovn.s32  d0, q0\n\t"
>     "subs       %[length], %[length], #1\n\t"
>     "vst1.16    {d0}, [%[samples]]!\n\t"
>     "bgt        1b\n\t"
>       /* output operands (or input operands that get modified) */
>     : [samples] "+r" (samples), [length] "+r" (length)
>     : [vol] "r" (vol4) /* input operands */
>     : "memory", "cc", "q0", "q1" /* clobber list */
>     );
> }
> 
> Checking ARM NEON svolume
> func: 1291289 usec (min = 12817, max = 13184, stddev = 65.9113).
> orig: 2438875 usec (min = 24322, max = 25605, stddev = 130.359).
> Orc not supported. Skipping
> 100%: Checks: 3, Failures: 0, Errors: 0
> 
> this is a bit better than the previous NEON code (~1300000 vs. ~1510000), 
> but still slower than ARM (~920000)

Nice catch on the alignment. I'm trying to extend our tests to catch
these cases. A couple of notes: R?mi Denis-Courmont mentions that you
will likely see performance benefits in the NEON code by sprinkling in
some preloads (PLD). I've also factored out the sconv code and that does
provide a win on all the boards I tried.

To get this moving for 3.0, could you respin just the sconv patches on
top of master (I'll push out my testing code soon) so that we can push
that bit out first while we work on the others?

Cheers,
Arun