Introducing the "wincall" Calling Convention for GCC

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I present a novel calling convention named "wincall" designed specifically for GCC. This convention is accompanied by the [[__gnu__::__wincall__]] attribute and caters to the latest Intel APX instructions on Windows systems, excluding Linux, BSD, and similar platforms.

Motivation:

The current Windows calling convention exhibits inefficiencies and introduces performance bottlenecks to C++ programs. This is particularly evident in libstdc++ components such as "span" and "string_view," as documented by Microsoft:
Reference: std::span is not zero-cost on microsoft abi.std::span is not zero-cost on microsoft abi.
[https://external-preview.redd.it/3aQ4Ul7hdCTUaypBERyoWUzPoM_LIhTbyaxpxOIdF00.jpg?auto=webp&s=4da8a5de5ad089262bbff35e68693d83c1f67399]<https://www.reddit.com/r/cpp/comments/p0pkcv/stdspan_is_not_zerocost_on_microsoft_abi>
r/cpp on Reddit: std::span is not zero-cost on microsoft abi.<https://www.reddit.com/r/cpp/comments/p0pkcv/stdspan_is_not_zerocost_on_microsoft_abi>
Posted by u/dmyrelot - 138 votes and 87 comments
www.reddit.com



The innovative Herbception mechanism, as proposed in P0709 by Herb Sutter, necessitates passing std::error using two registers and a carry flag. However, the existing Windows calling convention only allows returning one register.

The current calling conventions allocate just four registers for parameter passing. Given that Intel has extended x86_64 registers from 16 to 32 for APX, this presents an opportune moment to introduce a new calling convention to make optimal use of these additional registers.

Notably, Windows DLL APIs are labeled with [[__gnu__::__stdcall]], [[__gnu__::__cdecl__]], or [[__gnu__::__fastcall__]]. Implementing this new convention will not disrupt code that interfaces with DLLs. Furthermore, MSVC provides an option to toggle the default calling convention.

Eliminating the requirement for empty objects to occupy a register slot would substantially ease the burden on C++ programmers.

The Windows ABI already follows a caller-saved approach for passing registers, thus incorporating more registers for parameter passing should not pose issues.

Objectives:

Minimize the register usage for calls into the [[gnu::fastcall]] convention, the sole existing calling convention for Windows.
Retain caller-saved registers, consistent with Windows conventions.
Ensure compatibility with the Itanium C++ ABI, without impacting the sysvabi.
Implement the proposed "wincall" convention first and allow Microsoft and Clang to adopt it subsequently.
Seamlessly integrate with the existing Itanium C++ ABI rule for C++ objects' return behavior (as currently practiced by GCC, not MSVC).
Guidelines:

Eliminate the necessity for empty objects to claim register slots.
Return the first parameter using the rax register and the second parameter using the rdx register (similar to the 32-bit x86 convention).
When dealing with structures of 16 bits, split them into two parameters (unless the object is empty, in which case no registers are used). Objects of lengths 1, 2, 4, 8, 16, 32, or 64 bits employ a single register. A 128-bit object uses two registers, with the remaining bits passed using the object's address.
Adhere to the Itanium ABI rule for C++ objects' return, consistent with GCC's practice.
Preserve the caller-saved parameter approach utilized in current Windows conventions.


floating-point and __m128
stack XMM8  XMM7  XMM6  XMM5  XMM4  XMM3  XMM2  XMM1  XMM0

__m256
stack YMM8  YMM7  YMM6  YMM5  YMM4  YMM3  YMM2  YMM1  YMM0

__m512
stack ZMM8  ZMM7  ZMM6  ZMM5  ZMM4  ZMM3  ZMM2  ZMM1  ZMM0

bool, integer and __uint128_t/__int128_t and std::float128_t
stack R19   R18   R17   R16   R9    R8    RDX   RCX

Aggregates (8, 16, 32, or 64 bits. 128 bits split to 2) and __m64
stack R19   R18   R17   R16   R9    R8    RDX   RCX

Other aggregates, as pointers
stack R19   R18   R17   R16   R9    R8    RDX   RCX

Return values

A scalar return value that can fit into 64 bits, including the __m64 type, is returned through RAX.
A scalar return value that can fit into 128 bits, is returned through RAX (low) and RDX (high).

Non-scalar types including floats, doubles, and vector types such as __m128, __m128i, __m128d are returned in XMM0. The state of unused bits in the value returned in RAX or XMM0 is undefined.

User-defined types can be returned by value from global functions and static member functions. To return a user-defined type by value in RAX (RDX for 128 bits), it must have a length of 1, 2, 4, 8, 16, 32, 64 or 128 bits.

The "Herbception" concept involves a structure named std::error:

struct error
{
void * domain;
uintptr_t code;
};

In the context of this convention, std::error is passed using rax for the "domain" and rdx for the "code." Additionally, a carry flag is employed to handle exceptions. Herbception triggers an exception when the carry flag is set.





[Index of Archives]     [Linux C Programming]     [Linux Kernel]     [eCos]     [Fedora Development]     [Fedora Announce]     [Autoconf]     [The DWARVES Debugging Tools]     [Yosemite Campsites]     [Yosemite News]     [Linux GCC]

  Powered by Linux