Spurious optimization failures - unnecessary stack frame management

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

[please CC me, I am not subscribed to this list]

I am writing a C++ expression template wrapper library for FLINT [0]. I am finding that across gcc versions, and with no apparent pattern, the optimizer sometimes fails to properly eliminate stack frame management. Is this a known problem? What parameter values should one increase to have the optimizer do this more aggressively?

I am working on x86-64, if that is relevant.

Please excuse my being so vague, unfortunately I do not know much about optimizer internals. Let me show you an example. Consider the function

void
test_fmpzxx_asymadd_1 (fmpzxx& out, const fmpzxx& a,
        const fmpzxx& b, const fmpzxx& c, const fmpzxx& d)
{
    out = (a + (((b + (c + (a + b))) + c) + d));
}

The type fmpzxx has a single data member, which is a "long". One may obtain a pointer to this data member using the _fmpz() method. Using some expression template magic [1], the above line is turned into function calls to a C library, essentially equivalent to the following:

void
test_fmpzxx_asymadd_2 (fmpzxx& out, const fmpzxx& a,
        const fmpzxx& b, const fmpzxx& c, const fmpzxx& d)
{
    fmpz_t tmp;
    fmpz_init (tmp);

    fmpz_add (tmp, a._fmpz (), b._fmpz ());
    fmpz_add (tmp, c._fmpz (), tmp);
    fmpz_add (tmp, b._fmpz (), tmp);
    fmpz_add (tmp, tmp, c._fmpz ());
    fmpz_add (tmp, tmp, d._fmpz ());
    fmpz_add (out._fmpz(), a._fmpz (), tmp);

    fmpz_clear (tmp);
}

However, to attain this, the optimizer has to eliminate many temporaries, inline calls, track pointers etc. It seems to me that, for no apparent reason, this goes wrong sometimes. For example, in g++-4.6.4 or g++-4.8.1, both of the above functions yield essentially equal machine code, with a stack frame size of about 56 bytes. On the other hand, g++-4.7.3 produces the attached code [NB: this is compiled without exception suppert, to simplify comparison to the pure C code]. (I obtained this via objdump, since I did not find the extra labels etc produced by g++ -S helpful.) Notice that the stack frame size has grown to 376 bytes! I have been trying to understand the produced code, but could not make much sense of it. Some parts of the stack frame are initialized, then copied around, and then other data is used in calling the C functions. It seems like the optimizer just stopped arbitrarily, presumably because of some heuristic cutoff. My main question is: is there a switch to tune this heuristic?

Please note that this problem is not specific to version 4.7.3. There are other (similar) examples where e.g. 4.7.3 optimizes just fine, but say 4.8.1 produces similarly silly code, etc.

Thanks,
Tom

[0] http://www.flintlib.org/
[1] It is a rather big library by now. I am trying to avoid showing the relevant c++ code. In particular all my attempts at isolating a "minimal problematic example" have caused the optimizer to kick in before the code reached an acceptably small size.

You can find all the code at https://github.com/ness01/flint2/tree/gsoc, the functions test_fmpzxx_asymadd_? discussed are found in cxx/test/t-codegen.cpp.
0000000000402d80 <test_fmpzxx_asymadd_1>:
  402d80:	48 89 5c 24 d0       	mov    %rbx,-0x30(%rsp)
  402d85:	48 89 6c 24 d8       	mov    %rbp,-0x28(%rsp)
  402d8a:	49 89 f1             	mov    %rsi,%r9
  402d8d:	4c 89 64 24 e0       	mov    %r12,-0x20(%rsp)
  402d92:	4c 89 6c 24 e8       	mov    %r13,-0x18(%rsp)
  402d97:	48 89 fd             	mov    %rdi,%rbp
  402d9a:	4c 89 74 24 f0       	mov    %r14,-0x10(%rsp)
  402d9f:	4c 89 7c 24 f8       	mov    %r15,-0x8(%rsp)
  402da4:	48 81 ec 78 01 00 00 	sub    $0x178,%rsp
  402dab:	4c 89 84 24 88 00 00 	mov    %r8,0x88(%rsp)
  402db2:	00 
  402db3:	48 89 74 24 10       	mov    %rsi,0x10(%rsp)
  402db8:	48 89 cb             	mov    %rcx,%rbx
  402dbb:	48 89 54 24 30       	mov    %rdx,0x30(%rsp)
  402dc0:	48 89 4c 24 40       	mov    %rcx,0x40(%rsp)
  402dc5:	48 8d bc 24 a8 00 00 	lea    0xa8(%rsp),%rdi
  402dcc:	00 
  402dcd:	48 89 74 24 50       	mov    %rsi,0x50(%rsp)
  402dd2:	48 89 54 24 58       	mov    %rdx,0x58(%rsp)
  402dd7:	48 8d 74 24 10       	lea    0x10(%rsp),%rsi
  402ddc:	48 89 4c 24 78       	mov    %rcx,0x78(%rsp)
  402de1:	b9 12 00 00 00       	mov    $0x12,%ecx
  402de6:	48 c7 04 24 00 00 00 	movq   $0x0,(%rsp)
  402ded:	00 
  402dee:	f3 48 a5             	rep movsq %ds:(%rsi),%es:(%rdi)
  402df1:	4c 89 ce             	mov    %r9,%rsi
  402df4:	48 89 e7             	mov    %rsp,%rdi
  402df7:	4c 8b bc 24 c8 00 00 	mov    0xc8(%rsp),%r15
  402dfe:	00 
  402dff:	4c 8b b4 24 10 01 00 	mov    0x110(%rsp),%r14
  402e06:	00 
  402e07:	4c 8b a4 24 a8 00 00 	mov    0xa8(%rsp),%r12
  402e0e:	00 
  402e0f:	4c 8b ac 24 20 01 00 	mov    0x120(%rsp),%r13
  402e16:	00 
  402e17:	e8 24 ee ff ff       	callq  401c40 <fmpz_add@plt>
  402e1c:	48 89 e2             	mov    %rsp,%rdx
  402e1f:	48 89 de             	mov    %rbx,%rsi
  402e22:	48 89 e7             	mov    %rsp,%rdi
  402e25:	e8 16 ee ff ff       	callq  401c40 <fmpz_add@plt>
  402e2a:	48 89 e2             	mov    %rsp,%rdx
  402e2d:	4c 89 fe             	mov    %r15,%rsi
  402e30:	48 89 e7             	mov    %rsp,%rdi
  402e33:	e8 08 ee ff ff       	callq  401c40 <fmpz_add@plt>
  402e38:	4c 89 f2             	mov    %r14,%rdx
  402e3b:	48 89 e6             	mov    %rsp,%rsi
  402e3e:	48 89 e7             	mov    %rsp,%rdi
  402e41:	e8 fa ed ff ff       	callq  401c40 <fmpz_add@plt>
  402e46:	4c 89 ea             	mov    %r13,%rdx
  402e49:	48 89 e6             	mov    %rsp,%rsi
  402e4c:	48 89 e7             	mov    %rsp,%rdi
  402e4f:	e8 ec ed ff ff       	callq  401c40 <fmpz_add@plt>
  402e54:	48 89 ef             	mov    %rbp,%rdi
  402e57:	48 89 e2             	mov    %rsp,%rdx
  402e5a:	4c 89 e6             	mov    %r12,%rsi
  402e5d:	e8 de ed ff ff       	callq  401c40 <fmpz_add@plt>
  402e62:	48 8b 3c 24          	mov    (%rsp),%rdi
  402e66:	48 89 f8             	mov    %rdi,%rax
  402e69:	48 c1 f8 3e          	sar    $0x3e,%rax
  402e6d:	48 83 f8 01          	cmp    $0x1,%rax
  402e71:	74 3d                	je     402eb0 <test_fmpzxx_asymadd_1+0x130>
  402e73:	48 8b 9c 24 48 01 00 	mov    0x148(%rsp),%rbx
  402e7a:	00 
  402e7b:	48 8b ac 24 50 01 00 	mov    0x150(%rsp),%rbp
  402e82:	00 
  402e83:	4c 8b a4 24 58 01 00 	mov    0x158(%rsp),%r12
  402e8a:	00 
  402e8b:	4c 8b ac 24 60 01 00 	mov    0x160(%rsp),%r13
  402e92:	00 
  402e93:	4c 8b b4 24 68 01 00 	mov    0x168(%rsp),%r14
  402e9a:	00 
  402e9b:	4c 8b bc 24 70 01 00 	mov    0x170(%rsp),%r15
  402ea2:	00 
  402ea3:	48 81 c4 78 01 00 00 	add    $0x178,%rsp
  402eaa:	c3                   	retq   
  402eab:	0f 1f 44 00 00       	nopl   0x0(%rax,%rax,1)
  402eb0:	e8 eb ed ff ff       	callq  401ca0 <_fmpz_clear_mpz@plt>
  402eb5:	eb bc                	jmp    402e73 <test_fmpzxx_asymadd_1+0xf3>
  402eb7:	66 0f 1f 84 00 00 00 	nopw   0x0(%rax,%rax,1)
  402ebe:	00 00 

[Index of Archives]     [Linux C Programming]     [Linux Kernel]     [eCos]     [Fedora Development]     [Fedora Announce]     [Autoconf]     [The DWARVES Debugging Tools]     [Yosemite Campsites]     [Yosemite News]     [Linux GCC]

  Powered by Linux