On Thu, Jan 25, 2018 at 3:02 PM, Florian Weimer <fweimer@xxxxxxxxxx> wrote: > On 01/25/2018 03:52 PM, Peter Robinson wrote: >> >> On Thu, Jan 25, 2018 at 2:46 PM, Florian Weimer <fweimer@xxxxxxxxxx> >> wrote: >>> >>> GCC offers a generic tuning option for Arm these days, but we select >>> -mtune=cortex-8a instead. >>> >>> Is this still a good choice? >> >> >> I suspect the generic tuning is likely a better choice, is there any >> details about it anywhere? Basically Cortex-A8 is pretty much the >> lowest common denominator for ARMv7 > > > The generic tuning has this: So reading the gcc docs [1] it seems that generic-armv7-a makes sense. To quote "should tune the performance for a blend of processors within architecture arch. The aim is to generate code that run well on the current most popular processors, balancing between optimizations that benefit some CPUs in the range, and avoiding performance pitfalls of other CPUs." We still support a number of Cortex-A8 devices but we have a lot more Cortex_A7/9/15 devices these days too so I think generic makes sense here. Thanks, Peter [1] https://gcc.gnu.org/onlinedocs/gcc/ARM-Options.html > /* Generic Cortex tuning. Use more specific tunings if appropriate. */ > const struct tune_params arm_cortex_tune = > { > &generic_extra_costs, > &generic_addr_mode_costs, /* Addressing mode costs. */ > NULL, /* Sched adj cost. */ > arm_default_branch_cost, > &arm_default_vec_cost, > 1, /* Constant limit. */ > 5, /* Max cond insns. */ > 8, /* Memset max inline. */ > 2, /* Issue rate. */ > ARM_PREFETCH_NOT_BENEFICIAL, > tune_params::PREF_CONST_POOL_FALSE, > tune_params::PREF_LDRD_FALSE, > tune_params::LOG_OP_NON_SHORT_CIRCUIT_TRUE, /* Thumb. */ > tune_params::LOG_OP_NON_SHORT_CIRCUIT_TRUE, /* ARM. */ > tune_params::DISPARAGE_FLAGS_NEITHER, > tune_params::PREF_NEON_64_FALSE, > tune_params::PREF_NEON_STRINGOPS_FALSE, > tune_params::FUSE_NOTHING, > tune_params::SCHED_AUTOPREF_OFF > }; > > The Cortex-A8 tuning is: > > const struct tune_params arm_cortex_a8_tune = > { > &cortexa8_extra_costs, > &generic_addr_mode_costs, /* Addressing mode costs. */ > NULL, /* Sched adj cost. */ > arm_default_branch_cost, > &arm_default_vec_cost, > 1, /* Constant limit. */ > 5, /* Max cond insns. */ > 8, /* Memset max inline. */ > 2, /* Issue rate. */ > ARM_PREFETCH_NOT_BENEFICIAL, > tune_params::PREF_CONST_POOL_FALSE, > tune_params::PREF_LDRD_FALSE, > tune_params::LOG_OP_NON_SHORT_CIRCUIT_TRUE, /* Thumb. */ > tune_params::LOG_OP_NON_SHORT_CIRCUIT_TRUE, /* ARM. */ > tune_params::DISPARAGE_FLAGS_NEITHER, > tune_params::PREF_NEON_64_FALSE, > tune_params::PREF_NEON_STRINGOPS_TRUE, > tune_params::FUSE_NOTHING, > tune_params::SCHED_AUTOPREF_OFF > }; > > The real difference is in generic_extra_costs vs generic_extra_costs, and > too large to include here. One of the differences seems to be that on > Cortex-A8, floating point multiply & divide is considered relatively > more expensive, if I read the sources correctly. But this all a bit black > magic. > > Thanks, > Florian _______________________________________________ arm mailing list -- arm@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to arm-leave@xxxxxxxxxxxxxxxxxxxxxxx