Jan Hubicka <hubicka@xxxxxx> writes: > Hi, > here are updated scores. > https://lnt.opensuse.org/db_default/v4/SPEC/latest_runs_report?younger_in_days=14&older_in_days=0&all_elf_detail_stats=on&min_percentage_change=0.001&revisions=9388fc7bf0da61a8104e8501e5965120e9159e12%2Cea21f32198432a490dd490696322838d94b3d3b2%2C4f5431c5768bbba81a422f6fed6a6e2454c700ee%2C&include_user_branches=on > compares > base: mainline > 1st column: mainline with very cheap vectorization at -O2 and -O3 > 2nd column: mainline with cheap vectorization at -O2 and -O3. > > The short story is: > > 1) -O2 generic performance > kabylake (Intel): > very cheap > SPEC/SPEC2006/FP/total ~ 8.32% > SPEC/SPEC2006/total -0.38% 4.74% > SPEC/SPEC2006/INT/total -0.91% -0.14% > > SPEC/SPEC2017/INT/total 4.71% 7.11% > SPEC/SPEC2017/total 2.22% 6.52% > SPEC/SPEC2017/FP/total 0.34% 6.06% > zen > SPEC/SPEC2006/FP/total 0.61% 10.23% > SPEC/SPEC2006/total 0.26% 6.27% > SPEC/SPEC2006/INT/total 34.006 -0.24% 0.90% > > SPEC/SPEC2017/INT/total 3.937 5.34% 7.80% > SPEC/SPEC2017/total 3.02% 6.55% > SPEC/SPEC2017/FP/total 1.26% 5.60% > > 2) -O2 size: > -0.78% (very cheap) 6.51% (cheap) for spec2k2006 > -0.32% (very cheap) 6.75% (cheap) for spec2k2017 > 3) build times: > 0%, 0.16%, 0.71%, 0.93% (very cheap) 6.05% 4.80% 6.75% 7.15% (cheap) for spec2k2006 > 0.39% 0.57% 0.71% (very cheap) 5.40% 6.23% 8.44% (cheap) for spec2k2017 > here I simply copied data from different configuratoins > > So for SPEC i would say that most of compile time costs are derrived > from code size growth which is a problem with cheap model but not with > very cheap. Very cheap indeed results in code size improvements and > compile time impact is probably somewhere around 0.5% > > So from these scores alone this would seem that vectorization makes > sense at -O2 with very cheap model to me (I am sure we have other > optimizations with worse benefits to compile time tradeoffs). Thanks for running these. The biggest issue I know of for enabling very-cheap at -O2 is: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100089 Perhaps we could get around that by (hopefully temporarily) disabling BB SLP within loop vectorisation for the very-cheap model. This would purely be a workaround and we should remove it once the PR is fixed. (It would even be a compile-time win in the meantime :-)) Thanks, Richard > However there are usual arguments against: > > 1) Vectorizer being tuned for SPEC. I think the only way to overcome > that argument is to enable it by default :) > 2) Workloads improved are more of -Ofast type workloads > > Here are non-spec benchmarks we track: > https://lnt.opensuse.org/db_default/v4/CPP/latest_runs_report?younger_in_days=14&older_in_days=0&min_percentage_change=0.02&revisions=9388fc7bf0da61a8104e8501e5965120e9159e12%2Cea21f32198432a490dd490696322838d94b3d3b2%2C4f5431c5768bbba81a422f6fed6a6e2454c700ee%2C&include_user_branches=on > > I also tried to run Firefox some time ago. Results are not surprising - > vectorizaiton helps rendering benchmarks which are those compiler with > aggressive flags anyway. > > Honza