On Wed, 4 Aug 2021, Richard Sandiford wrote: > Hongtao Liu <crazylht@xxxxxxxxx> writes: > > On Tue, May 18, 2021 at 4:27 AM Richard Sandiford via Gcc-help > > <gcc-help@xxxxxxxxxxx> wrote: > >> > >> Jan Hubicka <hubicka@xxxxxx> writes: > >> > Hi, > >> > here are updated scores. > >> > https://lnt.opensuse.org/db_default/v4/SPEC/latest_runs_report?younger_in_days=14&older_in_days=0&all_elf_detail_stats=on&min_percentage_change=0.001&revisions=9388fc7bf0da61a8104e8501e5965120e9159e12%2Cea21f32198432a490dd490696322838d94b3d3b2%2C4f5431c5768bbba81a422f6fed6a6e2454c700ee%2C&include_user_branches=on > >> > compares > >> > base: mainline > >> > 1st column: mainline with very cheap vectorization at -O2 and -O3 > >> > 2nd column: mainline with cheap vectorization at -O2 and -O3. > >> > > >> > The short story is: > >> > > >> > 1) -O2 generic performance > >> > kabylake (Intel): > >> > very cheap > >> > SPEC/SPEC2006/FP/total ~ 8.32% > >> > SPEC/SPEC2006/total -0.38% 4.74% > >> > SPEC/SPEC2006/INT/total -0.91% -0.14% > >> > > >> > SPEC/SPEC2017/INT/total 4.71% 7.11% > >> > SPEC/SPEC2017/total 2.22% 6.52% > >> > SPEC/SPEC2017/FP/total 0.34% 6.06% > >> > zen > >> > SPEC/SPEC2006/FP/total 0.61% 10.23% > >> > SPEC/SPEC2006/total 0.26% 6.27% > >> > SPEC/SPEC2006/INT/total 34.006 -0.24% 0.90% > >> > > >> > SPEC/SPEC2017/INT/total 3.937 5.34% 7.80% > >> > SPEC/SPEC2017/total 3.02% 6.55% > >> > SPEC/SPEC2017/FP/total 1.26% 5.60% > >> > > >> > 2) -O2 size: > >> > -0.78% (very cheap) 6.51% (cheap) for spec2k2006 > >> > -0.32% (very cheap) 6.75% (cheap) for spec2k2017 > >> > 3) build times: > >> > 0%, 0.16%, 0.71%, 0.93% (very cheap) 6.05% 4.80% 6.75% 7.15% (cheap) for spec2k2006 > >> > 0.39% 0.57% 0.71% (very cheap) 5.40% 6.23% 8.44% (cheap) for spec2k2017 > >> > here I simply copied data from different configuratoins > >> > > >> > So for SPEC i would say that most of compile time costs are derrived > >> > from code size growth which is a problem with cheap model but not with > >> > very cheap. Very cheap indeed results in code size improvements and > >> > compile time impact is probably somewhere around 0.5% > >> > > >> > So from these scores alone this would seem that vectorization makes > >> > sense at -O2 with very cheap model to me (I am sure we have other > >> > optimizations with worse benefits to compile time tradeoffs). > >> > >> Thanks for running these. > >> > >> The biggest issue I know of for enabling very-cheap at -O2 is: > >> > >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100089 > >> > >> Perhaps we could get around that by (hopefully temporarily) disabling > >> BB SLP within loop vectorisation for the very-cheap model. This would > >> purely be a workaround and we should remove it once the PR is fixed. > >> (It would even be a compile-time win in the meantime :-)) > >> > >> Thanks, > >> Richard > >> > >> > However there are usual arguments against: > >> > > >> > 1) Vectorizer being tuned for SPEC. I think the only way to overcome > >> > that argument is to enable it by default :) > >> > 2) Workloads improved are more of -Ofast type workloads > >> > > >> > Here are non-spec benchmarks we track: > >> > https://lnt.opensuse.org/db_default/v4/CPP/latest_runs_report?younger_in_days=14&older_in_days=0&min_percentage_change=0.02&revisions=9388fc7bf0da61a8104e8501e5965120e9159e12%2Cea21f32198432a490dd490696322838d94b3d3b2%2C4f5431c5768bbba81a422f6fed6a6e2454c700ee%2C&include_user_branches=on > >> > > >> > I also tried to run Firefox some time ago. Results are not surprising - > >> > vectorizaiton helps rendering benchmarks which are those compiler with > >> > aggressive flags anyway. > >> > > >> > Honza > > > > Hi: > > I would like to ask if we can turn on O2 vectorization now? > > I think we still need to deal with the PR100089 issue that I mentioned above. > Like I say, “dealing with” it could be as simple as disabling: > > /* If we applied if-conversion then try to vectorize the > BB of innermost loops. > ??? Ideally BB vectorization would learn to vectorize > control flow by applying if-conversion on-the-fly, the > following retains the if-converted loop body even when > only non-if-converted parts took part in BB vectorization. */ > if (flag_tree_slp_vectorize != 0 > && loop_vectorized_call > && ! loop->inner) > > for the very-cheap vector cost model until the PR is fixed properly. Alternatively only enable loop vectorization at -O2 (the above checks flag_tree_slp_vectorize as well). At least the cost model kind does not have any influence on BB vectorization, that is, we get the same pros and cons as we do for -O3. Did anyone benchmark -O2 -ftree-{loop,slp}-vectorize separately yet? Richard.