Hi, I haven't read through the code at all, but I will give you a little general advice. Try to cut the code to the absolute minimum that shows the problem. It makes it easier for you to work with and check, and it makes it easier for other people to examine. Also make sure that the code has no other dependencies such as extra headers - ideally people should be able to compile the code themselves and test it (I realise this is difficult for those who don't have an ARM handy). Code that works without optimisation but fails with optimisation, or that works when you make a variable volatile, is always a bug. Occasionally, it is a bug in the compiler - but most often it is a bug in the code. Either way, it is important to figure out the root cause, and not try to hide it by making things volatile (though that might be a good temporary fix for a compiler bug). I am not familiar with Neon (and not as good as I should be at ARM assembly in general), but it looks to me that you have used specific registers in your inline assembly, and assumed specific registers for compiler use (such as variables). Don't do that. When you have turned off all optimisation, the compiler is consistent about which registers it uses for different purposes - when optimising, it changes register usage in a very unpredictable way. You must be explicit - all data going into your assembly must be declared, as must all data coming out of the assembly. And if you use specific registers, you need to tell the compiler about them (as "clobbers") - and be aware that the compiler might be using those registers for the input or output values. Getting inline assembly right is not easy, and it is often best to work with several small assembly statements rather than large ones - I usually make a "static inline" function around a line or two of inline assembly and then use that function in the code as needed. It can make the result a lot clearer, and makes it easier to mix the C and assembly - the end result is often better than I would make in pure assembly. Finally, is there a good reason why you need inline assembly rather than the neon intrinsics provided by gcc? <http://gcc.gnu.org/onlinedocs/gcc/ARM-NEON-Intrinsics.html> mvh., David On 19/02/14 20:04, Cody Rigney wrote: > Hi, > > I'm trying to add NEON optimizations to OpenCV's LK optical flow. See > link below. > https://github.com/Itseez/opencv/blob/2.4/modules/video/src/lkpyramid.cpp > > The gcc version could vary since this is an open source project, but > the one I'm currently using is 4.8.1. The target architecture is ARMv7 > w/ NEON. The processor I'm testing on is an ARM > Cortex-A15(big.LITTLE). > > The problem is, in release mode (where optimizations are set) it does > not work properly. However, in debug mode, it works fine. I tracked > down a specific variable(FLT_SCALE) that was being optimized out and > made it volatile and that part worked fine after that. However, I'm > still having incorrect behavior from some other optimization. I'm new > to inline assembly, so I thought maybe I'm doing something wrong > that's not telling the compiler that I'm using a certain variable. > > Below is the code at its current state. Ignore all the comments and > volatiles(for testing this problem) everywhere. It's WIP. I removed > unnecessary functions and code so it would be easier to see. I think > the problem is in the bottom-most asm block because if I do if(false) > to skip it, I don't run into the problem. Thanks. > <snip>