Hongtao Liu <crazylht@xxxxxxxxx> writes: > On Tue, May 18, 2021 at 4:27 AM Richard Sandiford via Gcc-help > <gcc-help@xxxxxxxxxxx> wrote: >> >> Jan Hubicka <hubicka@xxxxxx> writes: >> > Hi, >> > here are updated scores. >> > https://lnt.opensuse.org/db_default/v4/SPEC/latest_runs_report?younger_in_days=14&older_in_days=0&all_elf_detail_stats=on&min_percentage_change=0.001&revisions=9388fc7bf0da61a8104e8501e5965120e9159e12%2Cea21f32198432a490dd490696322838d94b3d3b2%2C4f5431c5768bbba81a422f6fed6a6e2454c700ee%2C&include_user_branches=on >> > compares >> > base: mainline >> > 1st column: mainline with very cheap vectorization at -O2 and -O3 >> > 2nd column: mainline with cheap vectorization at -O2 and -O3. >> > >> > The short story is: >> > >> > 1) -O2 generic performance >> > kabylake (Intel): >> > very cheap >> > SPEC/SPEC2006/FP/total ~ 8.32% >> > SPEC/SPEC2006/total -0.38% 4.74% >> > SPEC/SPEC2006/INT/total -0.91% -0.14% >> > >> > SPEC/SPEC2017/INT/total 4.71% 7.11% >> > SPEC/SPEC2017/total 2.22% 6.52% >> > SPEC/SPEC2017/FP/total 0.34% 6.06% >> > zen >> > SPEC/SPEC2006/FP/total 0.61% 10.23% >> > SPEC/SPEC2006/total 0.26% 6.27% >> > SPEC/SPEC2006/INT/total 34.006 -0.24% 0.90% >> > >> > SPEC/SPEC2017/INT/total 3.937 5.34% 7.80% >> > SPEC/SPEC2017/total 3.02% 6.55% >> > SPEC/SPEC2017/FP/total 1.26% 5.60% >> > >> > 2) -O2 size: >> > -0.78% (very cheap) 6.51% (cheap) for spec2k2006 >> > -0.32% (very cheap) 6.75% (cheap) for spec2k2017 >> > 3) build times: >> > 0%, 0.16%, 0.71%, 0.93% (very cheap) 6.05% 4.80% 6.75% 7.15% (cheap) for spec2k2006 >> > 0.39% 0.57% 0.71% (very cheap) 5.40% 6.23% 8.44% (cheap) for spec2k2017 >> > here I simply copied data from different configuratoins >> > >> > So for SPEC i would say that most of compile time costs are derrived >> > from code size growth which is a problem with cheap model but not with >> > very cheap. Very cheap indeed results in code size improvements and >> > compile time impact is probably somewhere around 0.5% >> > >> > So from these scores alone this would seem that vectorization makes >> > sense at -O2 with very cheap model to me (I am sure we have other >> > optimizations with worse benefits to compile time tradeoffs). >> >> Thanks for running these. >> >> The biggest issue I know of for enabling very-cheap at -O2 is: >> >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100089 >> >> Perhaps we could get around that by (hopefully temporarily) disabling >> BB SLP within loop vectorisation for the very-cheap model. This would >> purely be a workaround and we should remove it once the PR is fixed. >> (It would even be a compile-time win in the meantime :-)) >> >> Thanks, >> Richard >> >> > However there are usual arguments against: >> > >> > 1) Vectorizer being tuned for SPEC. I think the only way to overcome >> > that argument is to enable it by default :) >> > 2) Workloads improved are more of -Ofast type workloads >> > >> > Here are non-spec benchmarks we track: >> > https://lnt.opensuse.org/db_default/v4/CPP/latest_runs_report?younger_in_days=14&older_in_days=0&min_percentage_change=0.02&revisions=9388fc7bf0da61a8104e8501e5965120e9159e12%2Cea21f32198432a490dd490696322838d94b3d3b2%2C4f5431c5768bbba81a422f6fed6a6e2454c700ee%2C&include_user_branches=on >> > >> > I also tried to run Firefox some time ago. Results are not surprising - >> > vectorizaiton helps rendering benchmarks which are those compiler with >> > aggressive flags anyway. >> > >> > Honza > > Hi: > I would like to ask if we can turn on O2 vectorization now? I think we still need to deal with the PR100089 issue that I mentioned above. Like I say, “dealing with” it could be as simple as disabling: /* If we applied if-conversion then try to vectorize the BB of innermost loops. ??? Ideally BB vectorization would learn to vectorize control flow by applying if-conversion on-the-fly, the following retains the if-converted loop body even when only non-if-converted parts took part in BB vectorization. */ if (flag_tree_slp_vectorize != 0 && loop_vectorized_call && ! loop->inner) for the very-cheap vector cost model until the PR is fixed properly. Thanks, Richard