On Mon, Dec 17, 2018 at 11:16 AM Richard Biener <richard.guenther@xxxxxxxxx> wrote: > > On Mon, Dec 17, 2018 at 8:34 AM Thomas Koenig <tkoenig@xxxxxxxxxxxxx> wrote: > > > > Hi Harald, > > > > > If there's interest, I can create a bugzilla with test program > > > and test data. > > > > Please do. > > > > In an ideal world, people would use always use bounds checking, > > with almost zero overhead. This is not realistic, but we should > > not regress on our way there :-) > > GCC 9 IL looks saner than the GCC 7/8 one. Note both compilers > have bound checks inside the innermost loop. The main difference > seems to be in loop header copying where GCC 9 is behaving > much "better" IMHO. It would be interesting to see whether > -fno-tree-ch brings results of the compilers in-line again (even > if it causes the code to run even more slow). Oh, and -funroll-loops might be an issue as well given the large number of branches inside the loop body. Citing non-unrolled innermost loop body from GCC 9: .L19: testq %r8, %r8 jle .L25 cmpq %r8, %r12 jl .L26 movslq (%rax), %rdx testq %rdx, %rdx jle .L27 cmpq %rdx, %r14 jl .L28 cmpq %r8, %rcx jl .L29 cmpq %r11, %r13 jl .L30 imulq %r10, %rdx vxorpd %xmm0, %xmm0, %xmm0 vcvtss2sd (%rsi), %xmm0, %xmm0 subq %r10, %rdx leaq (%r15,%rdx,8), %rdx vmovsd (%rdx), %xmm1 incq %r8 vfmadd132sd (%r9), %xmm1, %xmm0 addq %rbp, %rsi addq %rbx, %rax vmovsd %xmm0, (%rdx) cmpl %r8d, %edi jg .L19 branch density of the bounds-checking code is quite dense and I suspect predictors don't like that very much. You might want to look at perf output with counting branch mispredicts. For GCC it might make sense to more (read: very) aggressively combine test&branches to abort()s. Maybe the FE can already do this for bound checks from a single statement? Richard. > Richard. > > > Regards > > > > Thomas