Re: Empty destructor definition disables optimization.

Marc Glisse <marc.glisse@xxxxxxxx> · Wed, 21 Dec 2016 08:46:33 +0100 (CET)

On Tue, 20 Dec 2016, Juan Cabrera wrote:

I was reading the assembly output for a specific usecase of a custom
optional<T> implementation I'm writing to see if the `optional<T>` abstraction
would incur in more overhead than what I was expecting.

This is the samallest and simplest sample code I came up with that illustrates
the problem I'm having:

   #include <exception>

   class M
   {
   public:
       int  value;
       bool valid;

      ~M() { }

       int get() const
       {
           if (valid)
               return value;
           std::terminate();
       }
   };

   M    func_1();
   void func_2(int a);

   int main(int, char**)
   {
       const auto i = func_1();
       if (i.valid)
       {
           func_2(i.get());
           func_2(i.get());
           func_2(i.get());
       }

       return 0;
   }

Now the issue is that the compiler checks `i.valid` to be true on each call to
`i.get()` even though the enclosing `if` statement has already done so.

This is the assembly output from g++ 6.2.1 (compiled with -O3)

   main:
           sub     rsp, 24
           mov     rdi, rsp
           call    func_1()
           cmp     BYTE PTR [rsp+4], 0
           jne     .L7
   .L2:
           xor     eax, eax
           add     rsp, 24
           ret
   .L7:
           mov     edi, DWORD PTR [rsp]
           call    func_2(int)
           cmp     BYTE PTR [rsp+4], 0
           mov     edi, DWORD PTR [rsp]
           je      .L3
           call    func_2(int)
           cmp     BYTE PTR [rsp+4], 0
           mov     edi, DWORD PTR [rsp]
           je      .L3
           call    func_2(int)
           jmp     .L2
   .L3:
           call    std::terminate()

The following slight modification produces better code:

   // class M ....

   M    func_1();
   void func_2(...);

   int main(int, char**)
   {
       const auto i = func_1();
       if (i.valid)
       {
           func_2(i.get(), i.get(), i.get());
       }

       return 0;
   }

--> (compiled with -O3)

   main:
           sub     rsp, 24
           mov     rdi, rsp
           call    func_1()
           cmp     BYTE PTR [rsp+4], 0
           je      .L2
           mov     edi, DWORD PTR [rsp]
           xor     eax, eax
           mov     edx, edi
           mov     esi, edi
           call    func_2(...)
   .L2:
           xor     eax, eax
           add     rsp, 24
           ret

After doing a bunch of tests I found out that by removing the destructor
definition from `M` the generated code for the first code example becomes:

   main:
           push    rbx
           call    func_1()
           mov     rbx, rax
           shr     rax, 32
           test    al, al
           je      .L2
           mov     edi, ebx
           call    func_2(int)
           mov     edi, ebx
           call    func_2(int)
           mov     edi, ebx
           call    func_2(int)
   .L2:
           xor     eax, eax
           pop     rbx
           ret

That's basically what I was expecting.
Of course removing the destructor definition is not an option for the the real
code I'm working on (the `value` is inside a union, etc).

Why is the compiling not able to optimize those comparison away?
clang++ also generates basically the same code as g++ in all cases which makes
me think I must be overlooking something important there.

Hello,

the hard part for the compiler is making sure that calling func2 cannot 
modify i.valid. When you have a destructor, the calling convention uses 
the return slot optimization, and gcc considers that the variable i 
escapes at that point. Without the destructor, the class is returned in a 
register and copied into a purely local i, which func2 has no way of 
modifying.

M* bad;
M func1(){
  M ret;
  bad = &ret; // magic: this is the address of i!
  return ret;
}
void func2(int){
  bad->valid = false;
}

I don't know if gcc could consider such code illegal (in the middle-end, 
so also in all languages, not just C++), you could search in bugzilla if 
there were previous requests, and if not file one to get a definitive 
answer.

--
Marc Glisse