Optimizing returning a struct instance larger a quadword

Denis Sukhonin <d.sukhonin@xxxxxxxxx> · Mon, 22 Jan 2018 16:36:31 -0500

Hi list,

I am doing a research on my own trying to understand the best way to
report an error from a function.
My environment:
* exceptions are disabled with -fno-exceptions,
* GCC 7.2,
* C++17, and
* macOS 10.12.

Suppose, there is a function which can fail: `FailingFn’. As a simple
and quite common solution, I could give it this signature: `bool
FaillingFn(Args…, Error &err)’. This way I have the bool variable
returned via a register, and err object allocated on the stack. So, I
can avoid accessing memory if return bool is true (meaning success).

However, since we have got C++17 with copy elision and structure
binding I want to simplify the signature to `std::tuple<bool, Error>
FaillingFn(Args…)’, or even `RetValue FaillingFn(Args…)’. Then I can
handle errors this way

if (auto [ok, err] = FailingFn(args…); !ok)
  // Handle error or, perhaps, just return it.
  return err;

Looks more expressive. With a few changes, we can make the `err’
object a variant and carry a result of the successful evaluation.
Though for the sake of simplicity, let’s assume it carries error
information, e.g., a `std::string’ which is obviously larger than a
64bit register.
I am hoping gcc recognizes the case of copy elision, allocate the
error object in the caller, and pass it as a reference. Also, having
the returned tuple broken into two independent variables would permit
the compiler to use registers for them, at least for the first one
which fits a register (I do not care about the second one until it
carries a payload though.)

Here below a sample. Assume we have a structure and some functions
that may fail:

template <typename T>
struct Pair {
    bool ok;
    T    value;
};

Pair<std::int64_t> FuncInt(bool const toFail) {
    return {!toFail, 42};
}

Pair<std::string> FuncString(bool const toFail) {
    return {!toFail, "DEADBEEF"};
}

auto UseInt(bool const flag) {
    if (auto const [ok, value] = FuncInt(flag); ok) {
        return value;
    }
    return -1L;
}

auto UseString(bool const flag) {
    if (auto const [ok, value] = FuncString(flag); ok) {
        return value;
    }
    return std::string{"DEADFA11"};
}

The optimization works with `FuncInt’: the ok and value have got to
`eax` and `edx’. The check code compiles into:
call FuncInt(bool)
test eax, eax

But, in case of `FuncString’, it is not happening. Here is what I get:
call FuncString[abi:cxx11](bool)
cmp DWORD PTR [rsp], 0

which obviously compares against memory. My intention is to avoid this
redundant read from memory and use a register instead like in
`FuncInt’.

Is there a way to tell gcc that instances of Pair should (or can at
least) be broken into two separate variables and returned with the
most efficient way?

I believe this optimization won't work perfectly with current ABI,
though the compiler should not limit itself to the spec if a call is
happening to a non-exposed function or `-flto’ is used.

Here is a sample with assembly: https://godbolt.org/g/vT2R6Y

In other words, I want `Pair<std::string> FuncString(bool const)’ to
behave like `bool FuncString(bool const, std::string &value)’
utilizing expressiveness of C++17 including “Structured binding."

As I understand, this problem is very similar to "Scalar replacement
of aggregates," but playing around with gcc optimizer's options didn't
give me any outcome or insight.

I don't have any specific requirements regarding the target OS and
architecture beside it is x86. If it is possible to get it done in a
generic way: I’m happy to know; if it works only in the very specific
environment: still glad to know. Perhaps, I can try to implement this
optimization with your help if it looks interesting.

--
Best regards,
Denis Sukhonin