> one loop running from a VTK lib > which takes up much processor time > has not been aligned to 16 byte boundary The -falign-loops option is a suggestion, not a requirements. Not all loops are aligned to the value specified. GCC uses various heuristics to determine if if should be aligned and only aligns loops if it will not require more than a certain number of nops. Compiling with profiling can help GCC determine better heuristics. > shark also tells me this loop contains a singele-precision floating > point computation that could be speeded up using altivec > -fast also turns on -maltivec -maltivec is not the same as auto-vectorization. One can try auto-vectorization or manually convert the loop to use Altivec intrinsics. David