In the absence of a response and after some digging, I decided the answer to the query is "no". So the small amount of -O3 code was moved into separate modules and compiled without -flto. This seems to produce the best of both--all sorts of whole- program optimization for the bulk of the -O2 code, and aggressive optimization for the bits that benefit from it.