On 08/24/2015 09:57 AM, Simon Dardis wrote:
Hello all, I'm investigating a GCC issue in which an effective FP register-register move occurs through memory in a well-known benchmark. I've tracked it down to some interaction of the conditional store elimination pass and the ssa-dom pass. The produced assembly with -O3 is: g: <snip> div.d $f9,$f6,$f4 li $8,1 # 0x1 li $2,1 # 0x1 mul.d $f2,$f9,$f9 sdc1 $f2,8($9) .L39: ldc1 $f1,8($9) li $13,1 # 0x1 <snip> L47: <snip> b .L39 sdc1 $f2,8($9) # delay slot In the above case, $9 points to some array and $f2 gets written there as expected. The basic-block labelled 47 is another entry to L39 and also writes out $f2 (with a different value). From L39 though, $f1 is loaded with the value we just wrote out. GCC has duplicated the store and not reused $f2. However if -fno-tree-dom-opts or -fno-tree-cselim is used GCC will generate: div.d $f9,$f4,$f2 li $2,1 # 0x1 li $7,1 # 0x1 .L37: li $13,1 # 0x1 move $11,$9 mul.d $f1,$f9,$f9 mul.d $f8,$f1,$f1 sdc1 $f1,8($9) which is a great deal better as there is no effective FP to FP move though memory. The C code that produces the above looks like: if( SomeVal > 1 ) { v = 1 / SomeVal; inverted = true ; } else { v = SomeVal; inverted = false ; } array[1] = v * v; fmadd loop with array[1] I have been able to reproduce this for x86_64 as well. How might I go about resolving this issue?
You'll need to look at the various dumps to see where things go wrong. -fdump-tree-all-details-blocks jeff