On 25/03/14 04:31, Xinrong Fu wrote: > Hi guys: > What does the number of stalled cycles in the CPU pipeline frontend > means? Why is the stalled frontend cycles of 32bit program more than > 64bit program's stalled cycles when they running on same 64bit system? > Is there any gcc options to fix it? > Are you asking why the same program runs faster when compiled as 64-bit rather than 32-bit? There are /many/ reasons why 64-bit x86 code can be faster than 32-bit x86 code - without having any idea about your code, we can only make general points. In comparison to 32-bit x86, the 64-bit mode has access to more registers, has wider registers (which speeds data movement), less complicated instruction decoding and instruction prefixes, more efficient floating point, and much more efficient calling conventions. It has the disadvantage that pointers take up twice as much data cache and memory bandwidth, as they are twice the size. As for gcc options to "fix" it, there is no problem to fix - it is normal that 64-bit code is a bit more efficient than 32-bit code from the same program, but details vary according to the code in question. One thing I notice from your post is that you are compiling without enabling optimisation, which cripples the compiler's performance. Enabling "-O2" will probably make your code several times faster (again, without information on the program, I can only make general statements). Different optimisation settings like "-Os", "-O3", and individual optimisation flags may or may not make the code faster, but "-O2" is a good start.