On Wed, Aug 04, 2021 at 11:22:53AM +0100, Richard Sandiford wrote: > Segher Boessenkool <segher@xxxxxxxxxxxxxxxxxxx> writes: > > On Wed, Aug 04, 2021 at 10:10:36AM +0100, Richard Sandiford wrote: > >> Richard Biener <rguenther@xxxxxxx> writes: > >> > Alternatively only enable loop vectorization at -O2 (the above checks > >> > flag_tree_slp_vectorize as well). At least the cost model kind > >> > does not have any influence on BB vectorization, that is, we get the > >> > same pros and cons as we do for -O3. > >> > >> Yeah, but a lot of the loop vector cost model choice is about controlling > >> code size growth and avoiding excessive runtime versioning tests. > > > > Both of those depend a lot on the target, and target-specific conditions > > as well (which CPU model is selected for example). Can we factor that > > in somehow? Maybe we need some target hook that returns the expected > > percentage code growth for vectorising a given loop, for example, and > > -O2 vs. -O3 then selects what percentage is acceptable. > > > >> BB SLP > >> should be a win on both code size and performance (barring significant > >> target costing issues). > > > > Yeah -- but this could use a similar hook as well (just a straightline > > piece of code instead of a loop). > > I think anything like that should be driven by motivating use cases. > It's not something that we can easily decide in the abstract. > > The results so far with using very-cheap at -O2 have been promising, > so I don't think new hooks should block that becoming the default. Right, but it wouldn't hurt to think a sec if we are on the right path forward. It's is crystal clear that to make good decisions about what and how to vectorise you need to take *some* target characteristics into account, and that will have to happen sooner rather than later. This was all in reply to > >> Yeah, but a lot of the loop vector cost model choice is about controlling > >> code size growth and avoiding excessive runtime versioning tests. It was not meant to hold up these patches :-) > >> PR100089 was an exception because we ended up keeping unvectorised > >> scalar code that would never have existed otherwise. BB SLP proper > >> shouldn't have that problem. > > > > It also is a tiny piece of code. There will always be tiny examples > > that are much worse (or much better) than average. > > Yeah, what makes PR100089 important isn't IMO the test itself, but the > underlying problem that the PR exposed. Enabling this “BB SLP in loop > vectorisation” code can lead to the generation of scalar COND_EXPRs even > though we know that ifcvt doesn't have a proper cost model for deciding > whether scalar COND_EXPRs are a win. > > Introducing scalar COND_EXPRs at -O3 is arguably an acceptable risk > (although still dubious), but I think it's something we need to avoid > for -O2, even if that means losing the optimisation. Yeah -- -O2 should almost always do the right thing, while -O3 can do bad things more often, it just has to be better "on average". Segher