That makes sense. In this case, the input parameters are actually memory addresses. So how would I do an output or clobber that would tell the compiler that the memory at those addresses will change? Thanks for your time, Cody On Thu, Feb 20, 2014 at 4:14 AM, Andrew Haley <aph@xxxxxxxxxx> wrote: > Hi, > > On 02/19/2014 07:04 PM, Cody Rigney wrote: >> I'm trying to add NEON optimizations to OpenCV's LK optical flow. See >> link below. >> https://github.com/Itseez/opencv/blob/2.4/modules/video/src/lkpyramid.cpp >> >> The gcc version could vary since this is an open source project, but >> the one I'm currently using is 4.8.1. The target architecture is ARMv7 >> w/ NEON. The processor I'm testing on is an ARM >> Cortex-A15(big.LITTLE). >> >> The problem is, in release mode (where optimizations are set) it does >> not work properly. However, in debug mode, it works fine. I tracked >> down a specific variable(FLT_SCALE) that was being optimized out and >> made it volatile and that part worked fine after that. However, I'm >> still having incorrect behavior from some other optimization. > > Forget about using volatile here. That's just wrong. > > You have to mark your inputs, outputs, and clobbers correctly. > > Look at this asm: > > __asm__ volatile ( > "vld1.16 {q0}, [%0]\n\t" //trow0[x + cn] > "vld1.16 {q1}, [%1]\n\t" //trow0[x - cn] > "vsub.i16 q5, q0, q1\n\t" //this is t0 > "vld1.16 {q2}, [%2]\n\t" //trow1[x + cn] > "vld1.16 {q3}, [%3]\n\t" //trow1[x - cn] > "vadd.i16 q6, q2, q3\n\t" //this > needs mult by 3 > "vld1.16 {q4}, [%4]\n\t" //trow1[x] > "vmul.i16 q7, q6, q8\n\t" //this > needs to add to trow1[x]*10 > "vmul.i16 q10, q4, q9\n\t" //this is > trow1[x]*10 > "vadd.i16 q11, q7, q10\n\t" //this is t1 > "vswp d22, d11\n\t" > "vst2.16 {q5}, [%5]\n\t" //interleave > "vst2.16 {q11}, [%6]\n\t" //interleave > : > : "r" (trow0 + x + cn), //0 > "r" (trow0 + x - cn), //1 > "r" (trow1 + x + cn), //2 > "r" (trow1 + x - cn), //3 > "r" (trow1 + x), //4 > "r" (drow + (x*2)), //5 > "r" (drow + (x*2)+8) //6 > : > ); > > It has no outputs. How is this possible? It does a lot of work. It must > have some outputs. I think there should be some outputs for this asm. I > think they are memory outputs. > > Go through all, the asm blocks, and mark the inputs, outputs, and clobbers. > Then it should work. > > Remember one basic thing: you must tell GCC about everything that an > asm does. If it affects memory, you must tell GCC. If it reads memory, > you must tell GCC. DO not lie to the compiler: it will bite you. > > Andrew. >