>> 1) I understand that Intel Atom is an in-order processor and that the >> last such processor Intel made was the original Pentium. How big is the >> performance penalty if the code is not optimized for in-order execution? > > The compiler will already do its best to optimize for in-order > execution, although it doesn't yet have any specific information about > scheduling for the Atom. The penalty depends on the code. It can be a > few percent. > >> 2) If the answer to question 1 is that it does give a noticeable penalty >> how does one control the output of gcc to produce in-order code? Should >> I choose -march=pentium or is there a single option that is enabled by >> the -march option that controls this behaviour? I've even searched for >> "in-order" in gcc's source code but the few occurrences that turns up >> doesn't solve this puzzle for me. > > No additional option is required. The relevant one is > -fschedule-insns2, which is on by default at -O2. > >> 3) When it comes to cache sizes and available registers I think I >> understand that the -mtune option controls this optimization (which is >> automatically set to the same value as -march when using that). Since >> there is no -march option for Intel Atom yet, is there anyway to control >> these optimizations manually in gcc and what should they be in that case? > > No, this requires specific support for the Atom. > Thanks for your input Ian. The following forum post gives a few opinions on what the optimization flags should be but unfortunately there is no or little feedback: http://stackoverflow.com/questions/110674/gcc-optimization-flags-for-intel-atom/379908 -march=prescott -O2 -fomit-frame-pointer seems to be a general consensus when googling but I'm unable to find the reasoning for it. It might just be sites referring to each other for all I know. But the first answer to that forum question mentions -march=core2 -mtune=pentium -mfpmath=sse. -march=core2 because the Atom is claimed to be "merom ISA compliant" (I have no idea what that is or its relevance to the -march option). -mtune=pentium because of the above mentioned in-order issue. Unfortunately no feedback on that post. What is your opinion on those claims? The -mfpmath=sse confuses me too since more googling claims that the SSE instructions on the Atom use several times the number of clock cycles they use on core2 or prescott (can't find the reference right now unfortunately). I value any input on those claims too. Many thanks again. Morgan