Thanks for the advice. I didn't realize before that volatile was actually hiding the problem. Do you mind providing an example of what you mean by using a "static inline" function? That sounds like a better way of managing the assembly. I know what you mean, but I would like to see an example of the details (like passing parameters, etc). Initially, I began writing the NEON acceleration in intrinsics. Then, I read more and more about NEON intrinsics being much slower when compiled with gcc, due to some stack pops and pushes that fill it up. Apparently, the Microsoft ARM compiler and Apple's ARM compiler do well with NEON intrinsics, but GCC does not. So I switched to inline assembly. I haven't actually tested this myself, but since OpenCV is cross-platform, I wanted to make the acceleration work cross-platform in the fastest way. Thanks, Cody On Thu, Feb 20, 2014 at 4:54 AM, David Brown <david@xxxxxxxxxxxxxxx> wrote: > Hi, > > I haven't read through the code at all, but I will give you a little > general advice. > > Try to cut the code to the absolute minimum that shows the problem. It > makes it easier for you to work with and check, and it makes it easier > for other people to examine. Also make sure that the code has no other > dependencies such as extra headers - ideally people should be able to > compile the code themselves and test it (I realise this is difficult for > those who don't have an ARM handy). > > Code that works without optimisation but fails with optimisation, or > that works when you make a variable volatile, is always a bug. > Occasionally, it is a bug in the compiler - but most often it is a bug > in the code. Either way, it is important to figure out the root cause, > and not try to hide it by making things volatile (though that might be a > good temporary fix for a compiler bug). > > I am not familiar with Neon (and not as good as I should be at ARM > assembly in general), but it looks to me that you have used specific > registers in your inline assembly, and assumed specific registers for > compiler use (such as variables). Don't do that. When you have turned > off all optimisation, the compiler is consistent about which registers > it uses for different purposes - when optimising, it changes register > usage in a very unpredictable way. You must be explicit - all data > going into your assembly must be declared, as must all data coming out > of the assembly. And if you use specific registers, you need to tell > the compiler about them (as "clobbers") - and be aware that the compiler > might be using those registers for the input or output values. > > Getting inline assembly right is not easy, and it is often best to work > with several small assembly statements rather than large ones - I > usually make a "static inline" function around a line or two of inline > assembly and then use that function in the code as needed. It can make > the result a lot clearer, and makes it easier to mix the C and assembly > - the end result is often better than I would make in pure assembly. > > Finally, is there a good reason why you need inline assembly rather than > the neon intrinsics provided by gcc? > > <http://gcc.gnu.org/onlinedocs/gcc/ARM-NEON-Intrinsics.html> > > > mvh., > > David > > > > > On 19/02/14 20:04, Cody Rigney wrote: >> Hi, >> >> I'm trying to add NEON optimizations to OpenCV's LK optical flow. See >> link below. >> https://github.com/Itseez/opencv/blob/2.4/modules/video/src/lkpyramid.cpp >> >> The gcc version could vary since this is an open source project, but >> the one I'm currently using is 4.8.1. The target architecture is ARMv7 >> w/ NEON. The processor I'm testing on is an ARM >> Cortex-A15(big.LITTLE). >> >> The problem is, in release mode (where optimizations are set) it does >> not work properly. However, in debug mode, it works fine. I tracked >> down a specific variable(FLT_SCALE) that was being optimized out and >> made it volatile and that part worked fine after that. However, I'm >> still having incorrect behavior from some other optimization. I'm new >> to inline assembly, so I thought maybe I'm doing something wrong >> that's not telling the compiler that I'm using a certain variable. >> >> Below is the code at its current state. Ignore all the comments and >> volatiles(for testing this problem) everywhere. It's WIP. I removed >> unnecessary functions and code so it would be easier to see. I think >> the problem is in the bottom-most asm block because if I do if(false) >> to skip it, I don't run into the problem. Thanks. >> > > <snip> > >