Re: Why vectorization didn't turn on by -O2

Segher Boessenkool <segher@xxxxxxxxxxxxxxxxxxx> · Wed, 4 Aug 2021 04:56:43 -0500

On Wed, Aug 04, 2021 at 10:10:36AM +0100, Richard Sandiford wrote:
> Richard Biener <rguenther@xxxxxxx> writes:
> > Alternatively only enable loop vectorization at -O2 (the above checks
> > flag_tree_slp_vectorize as well).  At least the cost model kind
> > does not have any influence on BB vectorization, that is, we get the
> > same pros and cons as we do for -O3.
> 
> Yeah, but a lot of the loop vector cost model choice is about controlling
> code size growth and avoiding excessive runtime versioning tests.

Both of those depend a lot on the target, and target-specific conditions
as well (which CPU model is selected for example).  Can we factor that
in somehow?  Maybe we need some target hook that returns the expected
percentage code growth for vectorising a given loop, for example, and
-O2 vs. -O3 then selects what percentage is acceptable.

> BB SLP
> should be a win on both code size and performance (barring significant
> target costing issues).

Yeah -- but this could use a similar hook as well (just a straightline
piece of code instead of a loop).

> PR100089 was an exception because we ended up keeping unvectorised
> scalar code that would never have existed otherwise.  BB SLP proper
> shouldn't have that problem.

It also is a tiny piece of code.  There will always be tiny examples
that are much worse (or much better) than average.

Segher