(Not sure, if I can send this to gcc-patches/gcc-bugs, as I neither have a patch nor a small reproducible testcase. So, sending to gcc-help) Hi, This looks like a missed vectorization opportunity for one of the 'Fortran' hot loops in cactusADM (CPU2006 benchmark) when compiled with "-mcpu=cortex-a57 -Ofast". Interestingly, the 'generic' model (compiled with plain "-Ofast or -O3" and without -mcpu option) vectorizes this hot loop, hence there is good runtime performance improvement noticed on native Aarch64 platform. I don't have a small reproducible testcase, hence quoting cactusADM benchmark here. The hot loop is present in Bench_StaggeredLeapfrog2() in StaggeredLeapfrog2.F file. For cortex-a57, vectorization report clearly mentions that scalar cost < vector_cost/vectorization_factor, hence didn't vectorize. For generic case, due to un-tuned vector cost model, the scalar cost > vector_cost/vectorization_factor (since scalar_cost = vector_cost), so the loop got vectorized << Output of generic vectorized case>> StaggeredLeapfrog2.fppized.f.130t.vect:StaggeredLeapfrog2.fppized.f:362:0: note: LOOP VECTORIZED I have also played around with cortexa57_vector_cost table(esp., scalar_stmt_cost, vector_stmt_cost, vec_unaligned_cost etc..,), which influences the vectorization decision in this case. The cortexa57_vector_cost table directly maps to the cost mentioned in "Cortex(r)-A57 Software Optimisation Guide". But, it looks like there is further scope of tuning the cortexa57 vector cost to vectorize such cases. Any comments on this missed opportunity ? Regards, Saravanan PS. I am not pasting the hot loop here, as there could be a license issue of using SPEC CPU2006 sources