Re: Why vectorization didn't turn on by -O2

Hongtao Liu via Gcc-help <gcc-help@xxxxxxxxxxx> · Tue, 24 Aug 2021 10:21:44 +0800



On Mon, Aug 16, 2021 at 2:09 PM Hongtao Liu <crazylht@xxxxxxxxx> wrote:
>
> On Mon, Aug 16, 2021 at 2:00 PM Hongtao Liu <crazylht@xxxxxxxxx> wrote:
> >
> > On Mon, Aug 16, 2021 at 11:23 AM Kewen.Lin via Gcc-help
> > <gcc-help@xxxxxxxxxxx> wrote:
> > >
> > > on 2021/8/4 下午4:31, Richard Biener wrote:
> > > > On Wed, 4 Aug 2021, Richard Sandiford wrote:
> > > >
> > > >> Hongtao Liu <crazylht@xxxxxxxxx> writes:
> > > >>> On Tue, May 18, 2021 at 4:27 AM Richard Sandiford via Gcc-help
> > > >>> <gcc-help@xxxxxxxxxxx> wrote:
> > > >>>>
> > > >>>> Jan Hubicka <hubicka@xxxxxx> writes:
> > > >>>>> Hi,
> > > >>>>> here are updated scores.
> > > >>>>> https://lnt.opensuse.org/db_default/v4/SPEC/latest_runs_report?younger_in_days=14&older_in_days=0&all_elf_detail_stats=on&min_percentage_change=0.001&revisions=9388fc7bf0da61a8104e8501e5965120e9159e12%2Cea21f32198432a490dd490696322838d94b3d3b2%2C4f5431c5768bbba81a422f6fed6a6e2454c700ee%2C&include_user_branches=on
> > > >>>>> compares
> > > >>>>>   base:  mainline
> > > >>>>>   1st column: mainline with very cheap vectorization at -O2 and -O3
> > > >>>>>   2nd column: mainline with cheap vectorization at -O2 and -O3.
> > > >>>>>
> > > >>>>> The short story is:
> > > >>>>>
> > > >>>>> 1) -O2 generic performance
> > > >>>>>     kabylake (Intel):
> > > >>>>>                               very    cheap
> > > >>>>>         SPEC/SPEC2006/FP/total        ~       8.32%
> > > >>>>>       SPEC/SPEC2006/total     -0.38%  4.74%
> > > >>>>>       SPEC/SPEC2006/INT/total -0.91%  -0.14%
> > > >>>>>
> > > >>>>>       SPEC/SPEC2017/INT/total 4.71%   7.11%
> > > >>>>>       SPEC/SPEC2017/total     2.22%   6.52%
> > > >>>>>       SPEC/SPEC2017/FP/total  0.34%   6.06%
> > > >>>>>     zen
> > > >>>>>         SPEC/SPEC2006/FP/total        0.61%   10.23%
> > > >>>>>       SPEC/SPEC2006/total     0.26%   6.27%
> > > >>>>>       SPEC/SPEC2006/INT/total 34.006  -0.24%  0.90%
> > > >>>>>
> > > >>>>>         SPEC/SPEC2017/INT/total       3.937   5.34%   7.80%
> > > >>>>>       SPEC/SPEC2017/total     3.02%   6.55%
> > > >>>>>       SPEC/SPEC2017/FP/total  1.26%   5.60%
> > > >>>>>
> > > >>>>>  2) -O2 size:
> > > >>>>>      -0.78% (very cheap) 6.51% (cheap) for spec2k2006
> > > >>>>>      -0.32% (very cheap) 6.75% (cheap) for spec2k2017
> > > >>>>>  3) build times:
> > > >>>>>      0%, 0.16%, 0.71%, 0.93% (very cheap) 6.05% 4.80% 6.75% 7.15% (cheap) for spec2k2006
> > > >>>>>      0.39% 0.57% 0.71%       (very cheap) 5.40% 6.23% 8.44%       (cheap) for spec2k2017
> > > >>>>>     here I simply copied data from different configuratoins
> > > >>>>>
> > > >>>>> So for SPEC i would say that most of compile time costs are derrived
> > > >>>>> from code size growth which is a problem with cheap model but not with
> > > >>>>> very cheap.  Very cheap indeed results in code size improvements and
> > > >>>>> compile time impact is probably somewhere around 0.5%
> > > >>>>>
> > > >>>>> So from these scores alone this would seem that vectorization makes
> > > >>>>> sense at -O2 with very cheap model to me (I am sure we have other
> > > >>>>> optimizations with worse benefits to compile time tradeoffs).
> > > >>>>
> > > >>>> Thanks for running these.
> > > >>>>
> > > >>>> The biggest issue I know of for enabling very-cheap at -O2 is:
> > > >>>>
> > > >>>>    https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100089
> > > >>>>
> > > >>>> Perhaps we could get around that by (hopefully temporarily) disabling
> > > >>>> BB SLP within loop vectorisation for the very-cheap model.  This would
> > > >>>> purely be a workaround and we should remove it once the PR is fixed.
> > > >>>> (It would even be a compile-time win in the meantime :-))
Fixed by

commit r12-3103-g819b7c3a339e3bdaf85cd55954c5536bd98aae09
Author: liuhongt <hongtao.liu@xxxxxxxxx>
Date:   Wed Aug 4 16:39:31 2021 +0800

    Disable slp in loop vectorizer when cost model is very-cheap.

    Performance impact for the commit with option:
    -march=x86-64 -O2 -ftree-vectorize -fvect-cost-model=very-cheap

    SPEC2017 fprate
    503.bwaves_r        BuildSame
    507.cactuBSSN_r         -0.04
    508.namd_r               0.14
    510.parest_r            -0.54
    511.povray_r             0.10
    519.lbm_r           BuildSame
    521.wrf_r                0.64
    526.blender_r           -0.32
    527.cam4_r               0.17
    538.imagick_r            0.09
    544.nab_r           BuildSame
    549.fotonik3d_r     BuildSame
    554.roms_r          BuildSame
    997.specrand_fr         -0.09
    Geometric mean:  0.02

    SPEC2017 intrate
    500.perlbench_r          0.26
    502.gcc_r                0.21
    505.mcf_r               -0.09
    520.omnetpp_r       BuildSame
    523.xalancbmk_r     BuildSame
    525.x264_r              -0.41
    531.deepsjeng_r     BuildSame
    541.leela_r              0.13
    548.exchange2_r     BuildSame
    557.xz_r            BuildSame
    999.specrand_ir     BuildSame
    Geometric mean:  0.02

    EEMBC: no regression, only improvement or build the same, the below is
    improved benchmarks.

    mp2decoddata1       7.59
    mp2decoddata2       31.80
    mp2decoddata3       12.15
    mp2decoddata4       11.16
    mp2decoddata5       11.19
    mp2decoddata1       7.06
    mp2decoddata2       24.12
    mp2decoddata3       10.83
    mp2decoddata4       10.04
    mp2decoddata5       10.07

    gcc/ChangeLog:

            PR tree-optimization/100089
            * tree-vectorizer.c (try_vectorize_loop_1): Disable slp in
            loop vectorizer when cost model is very-cheap.


-- 
BR,
Hongtao