Hi, On 02/19/2014 07:04 PM, Cody Rigney wrote: > I'm trying to add NEON optimizations to OpenCV's LK optical flow. See > link below. > https://github.com/Itseez/opencv/blob/2.4/modules/video/src/lkpyramid.cpp > > The gcc version could vary since this is an open source project, but > the one I'm currently using is 4.8.1. The target architecture is ARMv7 > w/ NEON. The processor I'm testing on is an ARM > Cortex-A15(big.LITTLE). > > The problem is, in release mode (where optimizations are set) it does > not work properly. However, in debug mode, it works fine. I tracked > down a specific variable(FLT_SCALE) that was being optimized out and > made it volatile and that part worked fine after that. However, I'm > still having incorrect behavior from some other optimization. Forget about using volatile here. That's just wrong. You have to mark your inputs, outputs, and clobbers correctly. Look at this asm: __asm__ volatile ( "vld1.16 {q0}, [%0]\n\t" //trow0[x + cn] "vld1.16 {q1}, [%1]\n\t" //trow0[x - cn] "vsub.i16 q5, q0, q1\n\t" //this is t0 "vld1.16 {q2}, [%2]\n\t" //trow1[x + cn] "vld1.16 {q3}, [%3]\n\t" //trow1[x - cn] "vadd.i16 q6, q2, q3\n\t" //this needs mult by 3 "vld1.16 {q4}, [%4]\n\t" //trow1[x] "vmul.i16 q7, q6, q8\n\t" //this needs to add to trow1[x]*10 "vmul.i16 q10, q4, q9\n\t" //this is trow1[x]*10 "vadd.i16 q11, q7, q10\n\t" //this is t1 "vswp d22, d11\n\t" "vst2.16 {q5}, [%5]\n\t" //interleave "vst2.16 {q11}, [%6]\n\t" //interleave : : "r" (trow0 + x + cn), //0 "r" (trow0 + x - cn), //1 "r" (trow1 + x + cn), //2 "r" (trow1 + x - cn), //3 "r" (trow1 + x), //4 "r" (drow + (x*2)), //5 "r" (drow + (x*2)+8) //6 : ); It has no outputs. How is this possible? It does a lot of work. It must have some outputs. I think there should be some outputs for this asm. I think they are memory outputs. Go through all, the asm blocks, and mark the inputs, outputs, and clobbers. Then it should work. Remember one basic thing: you must tell GCC about everything that an asm does. If it affects memory, you must tell GCC. If it reads memory, you must tell GCC. DO not lie to the compiler: it will bite you. Andrew.