On Wed, Aug 4, 2021 at 4:31 PM Richard Biener <rguenther@xxxxxxx> wrote: > > On Wed, 4 Aug 2021, Richard Sandiford wrote: > > > Hongtao Liu <crazylht@xxxxxxxxx> writes: > > > On Tue, May 18, 2021 at 4:27 AM Richard Sandiford via Gcc-help > > > <gcc-help@xxxxxxxxxxx> wrote: > > >> > > >> Jan Hubicka <hubicka@xxxxxx> writes: > > >> > Hi, > > >> > here are updated scores. > > >> > https://lnt.opensuse.org/db_default/v4/SPEC/latest_runs_report?younger_in_days=14&older_in_days=0&all_elf_detail_stats=on&min_percentage_change=0.001&revisions=9388fc7bf0da61a8104e8501e5965120e9159e12%2Cea21f32198432a490dd490696322838d94b3d3b2%2C4f5431c5768bbba81a422f6fed6a6e2454c700ee%2C&include_user_branches=on > > >> > compares > > >> > base: mainline > > >> > 1st column: mainline with very cheap vectorization at -O2 and -O3 > > >> > 2nd column: mainline with cheap vectorization at -O2 and -O3. > > >> > > > >> > The short story is: > > >> > > > >> > 1) -O2 generic performance > > >> > kabylake (Intel): > > >> > very cheap > > >> > SPEC/SPEC2006/FP/total ~ 8.32% > > >> > SPEC/SPEC2006/total -0.38% 4.74% > > >> > SPEC/SPEC2006/INT/total -0.91% -0.14% > > >> > > > >> > SPEC/SPEC2017/INT/total 4.71% 7.11% > > >> > SPEC/SPEC2017/total 2.22% 6.52% > > >> > SPEC/SPEC2017/FP/total 0.34% 6.06% > > >> > zen > > >> > SPEC/SPEC2006/FP/total 0.61% 10.23% > > >> > SPEC/SPEC2006/total 0.26% 6.27% > > >> > SPEC/SPEC2006/INT/total 34.006 -0.24% 0.90% > > >> > > > >> > SPEC/SPEC2017/INT/total 3.937 5.34% 7.80% > > >> > SPEC/SPEC2017/total 3.02% 6.55% > > >> > SPEC/SPEC2017/FP/total 1.26% 5.60% > > >> > > > >> > 2) -O2 size: > > >> > -0.78% (very cheap) 6.51% (cheap) for spec2k2006 > > >> > -0.32% (very cheap) 6.75% (cheap) for spec2k2017 > > >> > 3) build times: > > >> > 0%, 0.16%, 0.71%, 0.93% (very cheap) 6.05% 4.80% 6.75% 7.15% (cheap) for spec2k2006 > > >> > 0.39% 0.57% 0.71% (very cheap) 5.40% 6.23% 8.44% (cheap) for spec2k2017 > > >> > here I simply copied data from different configuratoins > > >> > > > >> > So for SPEC i would say that most of compile time costs are derrived > > >> > from code size growth which is a problem with cheap model but not with > > >> > very cheap. Very cheap indeed results in code size improvements and > > >> > compile time impact is probably somewhere around 0.5% > > >> > > > >> > So from these scores alone this would seem that vectorization makes > > >> > sense at -O2 with very cheap model to me (I am sure we have other > > >> > optimizations with worse benefits to compile time tradeoffs). > > >> > > >> Thanks for running these. > > >> > > >> The biggest issue I know of for enabling very-cheap at -O2 is: > > >> > > >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100089 > > >> > > >> Perhaps we could get around that by (hopefully temporarily) disabling > > >> BB SLP within loop vectorisation for the very-cheap model. This would > > >> purely be a workaround and we should remove it once the PR is fixed. > > >> (It would even be a compile-time win in the meantime :-)) > > >> > > >> Thanks, > > >> Richard > > >> > > >> > However there are usual arguments against: > > >> > > > >> > 1) Vectorizer being tuned for SPEC. I think the only way to overcome > > >> > that argument is to enable it by default :) > > >> > 2) Workloads improved are more of -Ofast type workloads > > >> > > > >> > Here are non-spec benchmarks we track: > > >> > https://lnt.opensuse.org/db_default/v4/CPP/latest_runs_report?younger_in_days=14&older_in_days=0&min_percentage_change=0.02&revisions=9388fc7bf0da61a8104e8501e5965120e9159e12%2Cea21f32198432a490dd490696322838d94b3d3b2%2C4f5431c5768bbba81a422f6fed6a6e2454c700ee%2C&include_user_branches=on > > >> > > > >> > I also tried to run Firefox some time ago. Results are not surprising - > > >> > vectorizaiton helps rendering benchmarks which are those compiler with > > >> > aggressive flags anyway. > > >> > > > >> > Honza > > > > > > Hi: > > > I would like to ask if we can turn on O2 vectorization now? > > > > I think we still need to deal with the PR100089 issue that I mentioned above. > > Like I say, “dealing with” it could be as simple as disabling: > > > > /* If we applied if-conversion then try to vectorize the > > BB of innermost loops. > > ??? Ideally BB vectorization would learn to vectorize > > control flow by applying if-conversion on-the-fly, the > > following retains the if-converted loop body even when > > only non-if-converted parts took part in BB vectorization. */ > > if (flag_tree_slp_vectorize != 0 > > && loop_vectorized_call > > && ! loop->inner) > > > > for the very-cheap vector cost model until the PR is fixed properly. > > Alternatively only enable loop vectorization at -O2 (the above checks > flag_tree_slp_vectorize as well). At least the cost model kind > does not have any influence on BB vectorization, that is, we get the > same pros and cons as we do for -O3. > > Did anyone benchmark -O2 -ftree-{loop,slp}-vectorize separately yet? I can collect 4 sets of data including both codesize and performance on SPEC2017 1. baseline: -O2 2. baseline + both slp and loop vectorizer: O2 -ftree-vectorize -fvect-cost-model=very-cheap. 3. baseline + only loop vectorizer: O2 -ftree-loop-vectorize -fvect-cost-model=very-cheap. 4. baseline + only bb vectorizer: O2 -ftree-slp-vectorize. > > Richard. -- BR, Hongtao