Hi,
I'm looking for help on improving the compile times of my unit tests. Compile
times of >60s per TU is making my life hard.
Alternatively, if you can tell me there's nothing I can do, then I can accept
my fate and stop worrying about compile time optimizations.
Picking one example from my ~500 unit test TUs:
You can see the source at https://github.com/VcDevel/Vc/blob/
c807aa0c841950e50ec7d370c9c22d6038c7e068/tests/loadstore.cpp
Note that TEST_TYPES (line 109) produces 91 instantiations of the
`load_store<VU>` function template from an outer product of two type lists.
(I'd actually like to make the type list larger by a factor of 13, but that
just blows the compiler up.)
Attached is the output of -ftime-report. I must say I was surprised to see
"phase opt and generate" with 62.5s and 95% of the total time, as well as 84%
of the memory usage. Though, OTOH, the resulting binary is 4.4MiB large with
about 3/4 of it being the .text section.
If you have any ideas what I could do (other than "test less"), I'd like to
try it.
Cheers,
Matthias
--
──────────────────────────────────────────────────────────────────────────
Dr. Matthias Kretz https://kretzfamily.de
GSI Helmholtzzentrum für Schwerionenforschung https://gsi.de
SIMD easy and portable https://github.com/VcDevel/Vc
──────────────────────────────────────────────────────────────────────────
g++-6 -ftime-report -DCOMPILE_FOR_UNIT_TESTS -DHAVE_CXX_ABI_H -DNO_ISA_CHECK -DTESTTYPES=float -DVc_USE_ALIASSTRATEGY_VECTORBUILTIN -I/home/mkretz/src/datapar -I/home/mkretz/src/datapar/tests/virtest -std=c++14 -Wno-ignored-attributes -W -Wall -Wswitch -Wformat -Wchar-subscripts -Wparentheses -Wmultichar -Wtrigraphs -Wpointer-arith -Wcast-align -Wreturn-type -pedantic -Wshadow -Wundef -Wold-style-cast -ftemplate-depth=512 -fmax-errors=10 -O2 -DNDEBUG -Wabi -fabi-version=0 -fabi-compat-version=0 -ffp-contract=fast -mavx2 -mbmi -mbmi2 -mlzcnt -mfma -MMD -MT tests/CMakeFiles/loadstore_avx2_vectorbuiltin_float.dir/loadstore.cpp.o -MF tests/CMakeFiles/loadstore_avx2_vectorbuiltin_float.dir/loadstore.cpp.o.d -o tests/CMakeFiles/loadstore_avx2_vectorbuiltin_float.dir/loadstore.cpp.o -c /home/mkretz/src/datapar/tests/loadstore.cpp
Execution times (seconds)
phase setup : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 1603 kB ( 0%) ggc
phase parsing : 0.95 ( 1%) usr 0.34 ( 6%) sys 1.29 ( 2%) wall 173444 kB ( 6%) ggc
phase lang. deferred : 2.21 ( 3%) usr 0.32 ( 6%) sys 2.53 ( 4%) wall 304946 kB (10%) ggc
phase opt and generate : 62.51 (95%) usr 4.64 (88%) sys 67.15 (95%) wall 2582438 kB (84%) ggc
|name lookup : 0.29 ( 0%) usr 0.07 ( 1%) sys 0.35 ( 0%) wall 57096 kB ( 2%) ggc
|overload resolution : 1.39 ( 2%) usr 0.25 ( 5%) sys 1.60 ( 2%) wall 244740 kB ( 8%) ggc
garbage collection : 2.02 ( 3%) usr 0.02 ( 0%) sys 2.06 ( 3%) wall 0 kB ( 0%) ggc
dump files : 0.19 ( 0%) usr 0.05 ( 1%) sys 0.18 ( 0%) wall 0 kB ( 0%) ggc
callgraph construction : 0.39 ( 1%) usr 0.04 ( 1%) sys 0.50 ( 1%) wall 27285 kB ( 1%) ggc
callgraph optimization : 0.50 ( 1%) usr 0.08 ( 2%) sys 0.45 ( 1%) wall 18936 kB ( 1%) ggc
ipa dead code removal : 0.13 ( 0%) usr 0.00 ( 0%) sys 0.12 ( 0%) wall 0 kB ( 0%) ggc
ipa inheritance graph : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 6 kB ( 0%) ggc
ipa cp : 0.19 ( 0%) usr 0.00 ( 0%) sys 0.20 ( 0%) wall 17491 kB ( 1%) ggc
ipa inlining heuristics : 1.82 ( 3%) usr 0.14 ( 3%) sys 1.86 ( 3%) wall 32483 kB ( 1%) ggc
ipa function splitting : 0.10 ( 0%) usr 0.00 ( 0%) sys 0.11 ( 0%) wall 162 kB ( 0%) ggc
ipa comdats : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc
ipa reference : 0.05 ( 0%) usr 0.00 ( 0%) sys 0.06 ( 0%) wall 0 kB ( 0%) ggc
ipa profile : 0.07 ( 0%) usr 0.00 ( 0%) sys 0.08 ( 0%) wall 0 kB ( 0%) ggc
ipa pure const : 0.15 ( 0%) usr 0.00 ( 0%) sys 0.14 ( 0%) wall 36 kB ( 0%) ggc
ipa icf : 0.28 ( 0%) usr 0.00 ( 0%) sys 0.28 ( 0%) wall 19 kB ( 0%) ggc
ipa SRA : 0.26 ( 0%) usr 0.04 ( 1%) sys 0.36 ( 1%) wall 44294 kB ( 1%) ggc
ipa free inline summary : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc
cfg construction : 0.05 ( 0%) usr 0.00 ( 0%) sys 0.08 ( 0%) wall 9880 kB ( 0%) ggc
cfg cleanup : 1.35 ( 2%) usr 0.02 ( 0%) sys 1.42 ( 2%) wall 23017 kB ( 1%) ggc
trivially dead code : 0.48 ( 1%) usr 0.00 ( 0%) sys 0.39 ( 1%) wall 0 kB ( 0%) ggc
df scan insns : 0.38 ( 1%) usr 0.02 ( 0%) sys 0.47 ( 1%) wall 107 kB ( 0%) ggc
df multiple defs : 0.20 ( 0%) usr 0.00 ( 0%) sys 0.17 ( 0%) wall 0 kB ( 0%) ggc
df reaching defs : 0.55 ( 1%) usr 0.00 ( 0%) sys 0.68 ( 1%) wall 0 kB ( 0%) ggc
df live regs : 3.05 ( 5%) usr 0.04 ( 1%) sys 2.95 ( 4%) wall 0 kB ( 0%) ggc
df live&initialized regs: 1.61 ( 2%) usr 0.01 ( 0%) sys 1.60 ( 2%) wall 0 kB ( 0%) ggc
df must-initialized regs: 0.17 ( 0%) usr 0.01 ( 0%) sys 0.08 ( 0%) wall 0 kB ( 0%) ggc
df use-def / def-use chains: 0.26 ( 0%) usr 0.00 ( 0%) sys 0.23 ( 0%) wall 0 kB ( 0%) ggc
df reg dead/unused notes: 1.12 ( 2%) usr 0.03 ( 1%) sys 1.27 ( 2%) wall 22388 kB ( 1%) ggc
register information : 0.34 ( 1%) usr 0.00 ( 0%) sys 0.26 ( 0%) wall 0 kB ( 0%) ggc
alias analysis : 0.64 ( 1%) usr 0.01 ( 0%) sys 0.49 ( 1%) wall 46933 kB ( 2%) ggc
alias stmt walking : 3.07 ( 5%) usr 0.18 ( 3%) sys 2.85 ( 4%) wall 3338 kB ( 0%) ggc
register scan : 0.13 ( 0%) usr 0.00 ( 0%) sys 0.09 ( 0%) wall 1319 kB ( 0%) ggc
rebuild jump labels : 0.24 ( 0%) usr 0.00 ( 0%) sys 0.24 ( 0%) wall 0 kB ( 0%) ggc
preprocessing : 0.10 ( 0%) usr 0.09 ( 2%) sys 0.19 ( 0%) wall 4583 kB ( 0%) ggc
parser (global) : 0.22 ( 0%) usr 0.08 ( 2%) sys 0.31 ( 0%) wall 65714 kB ( 2%) ggc
parser struct body : 0.08 ( 0%) usr 0.02 ( 0%) sys 0.09 ( 0%) wall 13819 kB ( 0%) ggc
parser function body : 0.03 ( 0%) usr 0.01 ( 0%) sys 0.02 ( 0%) wall 3162 kB ( 0%) ggc
parser inl. func. body : 0.20 ( 0%) usr 0.04 ( 1%) sys 0.25 ( 0%) wall 21910 kB ( 1%) ggc
parser inl. meth. body : 0.06 ( 0%) usr 0.00 ( 0%) sys 0.08 ( 0%) wall 11125 kB ( 0%) ggc
template instantiation : 2.16 ( 3%) usr 0.42 ( 8%) sys 2.58 ( 4%) wall 357823 kB (12%) ggc
early inlining heuristics: 0.32 ( 0%) usr 0.02 ( 0%) sys 0.31 ( 0%) wall 45802 kB ( 1%) ggc
inline parameters : 0.21 ( 0%) usr 0.09 ( 2%) sys 0.48 ( 1%) wall 17286 kB ( 1%) ggc
integration : 3.13 ( 5%) usr 1.17 (22%) sys 4.30 ( 6%) wall 756390 kB (25%) ggc
tree gimplify : 0.24 ( 0%) usr 0.05 ( 1%) sys 0.22 ( 0%) wall 46440 kB ( 2%) ggc
tree eh : 0.13 ( 0%) usr 0.03 ( 1%) sys 0.20 ( 0%) wall 22312 kB ( 1%) ggc
tree CFG construction : 0.04 ( 0%) usr 0.01 ( 0%) sys 0.10 ( 0%) wall 28305 kB ( 1%) ggc
tree CFG cleanup : 1.47 ( 2%) usr 0.18 ( 3%) sys 1.80 ( 3%) wall 3102 kB ( 0%) ggc
tree tail merge : 0.26 ( 0%) usr 0.05 ( 1%) sys 0.22 ( 0%) wall 184 kB ( 0%) ggc
tree VRP : 1.32 ( 2%) usr 0.07 ( 1%) sys 1.48 ( 2%) wall 43557 kB ( 1%) ggc
tree copy propagation : 0.10 ( 0%) usr 0.01 ( 0%) sys 0.24 ( 0%) wall 5 kB ( 0%) ggc
tree PTA : 1.16 ( 2%) usr 0.06 ( 1%) sys 1.26 ( 2%) wall 8014 kB ( 0%) ggc
tree PHI insertion : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 3568 kB ( 0%) ggc
tree SSA rewrite : 0.55 ( 1%) usr 0.05 ( 1%) sys 0.48 ( 1%) wall 56626 kB ( 2%) ggc
tree SSA other : 0.05 ( 0%) usr 0.01 ( 0%) sys 0.05 ( 0%) wall 1868 kB ( 0%) ggc
tree SSA incremental : 0.90 ( 1%) usr 0.16 ( 3%) sys 1.25 ( 2%) wall 38069 kB ( 1%) ggc
tree operand scan : 1.29 ( 2%) usr 0.23 ( 4%) sys 1.41 ( 2%) wall 133496 kB ( 4%) ggc
dominator optimization : 1.44 ( 2%) usr 0.11 ( 2%) sys 1.44 ( 2%) wall 86088 kB ( 3%) ggc
tree SRA : 0.30 ( 0%) usr 0.06 ( 1%) sys 0.43 ( 1%) wall 16973 kB ( 1%) ggc
isolate eroneous paths : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall 0 kB ( 0%) ggc
tree CCP : 0.90 ( 1%) usr 0.12 ( 2%) sys 1.04 ( 1%) wall 24215 kB ( 1%) ggc
tree PHI const/copy prop: 0.01 ( 0%) usr 0.01 ( 0%) sys 0.06 ( 0%) wall 238 kB ( 0%) ggc
tree split crit edges : 0.11 ( 0%) usr 0.02 ( 0%) sys 0.14 ( 0%) wall 69973 kB ( 2%) ggc
tree reassociation : 0.08 ( 0%) usr 0.00 ( 0%) sys 0.07 ( 0%) wall 23 kB ( 0%) ggc
tree PRE : 1.49 ( 2%) usr 0.17 ( 3%) sys 1.87 ( 3%) wall 43911 kB ( 1%) ggc
tree FRE : 1.51 ( 2%) usr 0.21 ( 4%) sys 1.82 ( 3%) wall 26854 kB ( 1%) ggc
tree code sinking : 0.12 ( 0%) usr 0.00 ( 0%) sys 0.11 ( 0%) wall 23981 kB ( 1%) ggc
tree linearize phis : 0.02 ( 0%) usr 0.01 ( 0%) sys 0.04 ( 0%) wall 1197 kB ( 0%) ggc
tree backward propagate : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 0 kB ( 0%) ggc
tree forward propagate : 0.44 ( 1%) usr 0.07 ( 1%) sys 0.49 ( 1%) wall 5803 kB ( 0%) ggc
tree phiprop : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc
tree conservative DCE : 0.35 ( 1%) usr 0.02 ( 0%) sys 0.41 ( 1%) wall 330 kB ( 0%) ggc
tree aggressive DCE : 0.37 ( 1%) usr 0.07 ( 1%) sys 0.44 ( 1%) wall 17102 kB ( 1%) ggc
tree DSE : 0.20 ( 0%) usr 0.02 ( 0%) sys 0.19 ( 0%) wall 36 kB ( 0%) ggc
PHI merge : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 3 kB ( 0%) ggc
tree loop bounds : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 1490 kB ( 0%) ggc
tree loop invariant motion: 0.10 ( 0%) usr 0.00 ( 0%) sys 0.11 ( 0%) wall 0 kB ( 0%) ggc
tree canonical iv : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 1856 kB ( 0%) ggc
scev constant prop : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 1986 kB ( 0%) ggc
complete unrolling : 0.13 ( 0%) usr 0.02 ( 0%) sys 0.16 ( 0%) wall 9437 kB ( 0%) ggc
tree iv optimization : 0.18 ( 0%) usr 0.01 ( 0%) sys 0.15 ( 0%) wall 17242 kB ( 1%) ggc
tree copy headers : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 91 kB ( 0%) ggc
tree SSA uncprop : 0.09 ( 0%) usr 0.01 ( 0%) sys 0.05 ( 0%) wall 0 kB ( 0%) ggc
tree switch conversion : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc
tree strlen optimization: 0.05 ( 0%) usr 0.01 ( 0%) sys 0.03 ( 0%) wall 45 kB ( 0%) ggc
dominance frontiers : 0.19 ( 0%) usr 0.00 ( 0%) sys 0.13 ( 0%) wall 0 kB ( 0%) ggc
dominance computation : 1.63 ( 2%) usr 0.20 ( 4%) sys 1.74 ( 2%) wall 0 kB ( 0%) ggc
control dependences : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall 0 kB ( 0%) ggc
out of ssa : 0.13 ( 0%) usr 0.01 ( 0%) sys 0.11 ( 0%) wall 782 kB ( 0%) ggc
expand vars : 0.12 ( 0%) usr 0.00 ( 0%) sys 0.21 ( 0%) wall 16593 kB ( 1%) ggc
expand : 0.83 ( 1%) usr 0.02 ( 0%) sys 0.78 ( 1%) wall 215761 kB ( 7%) ggc
post expand cleanups : 0.10 ( 0%) usr 0.01 ( 0%) sys 0.12 ( 0%) wall 19073 kB ( 1%) ggc
lower subreg : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 5 kB ( 0%) ggc
forward prop : 0.35 ( 1%) usr 0.02 ( 0%) sys 0.38 ( 1%) wall 15065 kB ( 0%) ggc
CSE : 1.35 ( 2%) usr 0.03 ( 1%) sys 1.34 ( 2%) wall 10138 kB ( 0%) ggc
dead code elimination : 0.22 ( 0%) usr 0.00 ( 0%) sys 0.20 ( 0%) wall 0 kB ( 0%) ggc
dead store elim1 : 0.45 ( 1%) usr 0.00 ( 0%) sys 0.45 ( 1%) wall 15081 kB ( 0%) ggc
dead store elim2 : 0.61 ( 1%) usr 0.00 ( 0%) sys 0.62 ( 1%) wall 16283 kB ( 1%) ggc
loop analysis : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc
loop init : 0.74 ( 1%) usr 0.08 ( 2%) sys 0.69 ( 1%) wall 25907 kB ( 1%) ggc
loop invariant motion : 0.08 ( 0%) usr 0.00 ( 0%) sys 0.11 ( 0%) wall 452 kB ( 0%) ggc
loop fini : 0.06 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall 0 kB ( 0%) ggc
CPROP : 1.18 ( 2%) usr 0.03 ( 1%) sys 1.48 ( 2%) wall 57451 kB ( 2%) ggc
PRE : 0.96 ( 1%) usr 0.00 ( 0%) sys 0.88 ( 1%) wall 12158 kB ( 0%) ggc
CSE 2 : 0.81 ( 1%) usr 0.00 ( 0%) sys 0.70 ( 1%) wall 4501 kB ( 0%) ggc
branch prediction : 0.21 ( 0%) usr 0.03 ( 1%) sys 0.24 ( 0%) wall 9857 kB ( 0%) ggc
combiner : 1.13 ( 2%) usr 0.01 ( 0%) sys 1.09 ( 2%) wall 41785 kB ( 1%) ggc
if-conversion : 0.14 ( 0%) usr 0.00 ( 0%) sys 0.08 ( 0%) wall 1484 kB ( 0%) ggc
integrated RA : 2.81 ( 4%) usr 0.03 ( 1%) sys 2.97 ( 4%) wall 131346 kB ( 4%) ggc
LRA non-specific : 0.79 ( 1%) usr 0.00 ( 0%) sys 0.99 ( 1%) wall 10009 kB ( 0%) ggc
LRA virtuals elimination: 0.21 ( 0%) usr 0.01 ( 0%) sys 0.17 ( 0%) wall 15474 kB ( 1%) ggc
LRA reload inheritance : 0.22 ( 0%) usr 0.00 ( 0%) sys 0.06 ( 0%) wall 241 kB ( 0%) ggc
LRA create live ranges : 0.79 ( 1%) usr 0.00 ( 0%) sys 0.88 ( 1%) wall 2857 kB ( 0%) ggc
LRA hard reg assignment : 0.05 ( 0%) usr 0.00 ( 0%) sys 0.08 ( 0%) wall 0 kB ( 0%) ggc
LRA rematerialization : 0.08 ( 0%) usr 0.01 ( 0%) sys 0.07 ( 0%) wall 0 kB ( 0%) ggc
reload : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 0 kB ( 0%) ggc
reload CSE regs : 1.21 ( 2%) usr 0.00 ( 0%) sys 1.29 ( 2%) wall 30066 kB ( 1%) ggc
ree : 0.09 ( 0%) usr 0.00 ( 0%) sys 0.08 ( 0%) wall 156 kB ( 0%) ggc
thread pro- & epilogue : 0.23 ( 0%) usr 0.00 ( 0%) sys 0.26 ( 0%) wall 3920 kB ( 0%) ggc
if-conversion 2 : 0.07 ( 0%) usr 0.00 ( 0%) sys 0.10 ( 0%) wall 6 kB ( 0%) ggc
combine stack adjustments: 0.03 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall 8 kB ( 0%) ggc
peephole 2 : 0.18 ( 0%) usr 0.00 ( 0%) sys 0.14 ( 0%) wall 1855 kB ( 0%) ggc
hard reg cprop : 0.17 ( 0%) usr 0.00 ( 0%) sys 0.20 ( 0%) wall 179 kB ( 0%) ggc
scheduling 2 : 1.99 ( 3%) usr 0.01 ( 0%) sys 2.23 ( 3%) wall 5795 kB ( 0%) ggc
machine dep reorg : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 1 kB ( 0%) ggc
reorder blocks : 0.32 ( 0%) usr 0.00 ( 0%) sys 0.28 ( 0%) wall 32877 kB ( 1%) ggc
shorten branches : 0.18 ( 0%) usr 0.00 ( 0%) sys 0.19 ( 0%) wall 0 kB ( 0%) ggc
reg stack : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 133 kB ( 0%) ggc
final : 0.38 ( 1%) usr 0.01 ( 0%) sys 0.47 ( 1%) wall 34881 kB ( 1%) ggc
variable output : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 29 kB ( 0%) ggc
tree if-combine : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc
uninit var analysis : 0.01 ( 0%) usr 0.01 ( 0%) sys 0.07 ( 0%) wall 0 kB ( 0%) ggc
straight-line strength reduction: 0.08 ( 0%) usr 0.01 ( 0%) sys 0.02 ( 0%) wall 68 kB ( 0%) ggc
address lowering : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 358 kB ( 0%) ggc
rest of compilation : 1.00 ( 2%) usr 0.04 ( 1%) sys 1.23 ( 2%) wall 20551 kB ( 1%) ggc
remove unused locals : 0.44 ( 1%) usr 0.10 ( 2%) sys 0.52 ( 1%) wall 147 kB ( 0%) ggc
address taken : 0.39 ( 1%) usr 0.06 ( 1%) sys 0.37 ( 1%) wall 0 kB ( 0%) ggc
unaccounted todo : 0.63 ( 1%) usr 0.08 ( 2%) sys 0.54 ( 1%) wall 26331 kB ( 1%) ggc
rebuild frequencies : 0.06 ( 0%) usr 0.01 ( 0%) sys 0.03 ( 0%) wall 1117 kB ( 0%) ggc
repair loop structures : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 0 kB ( 0%) ggc
TOTAL : 65.67 5.30 70.98 3062443 kB