Re: Using SSE2 with the old i386 ABI

Jeff Law <law@xxxxxxxxxx> · Mon, 31 Jul 2017 09:15:33 -0600



On 07/31/2017 08:04 AM, Florian Weimer wrote:
> Fedora is considering getting rid of the i686 kernel.  If that happens,
> all i686 installations will be in fact x86-64, so they have SSE2.
> 
> There has been an uncoordinated, breaking ABI change for i386 when the
> stack alignment requirements were changed.  A lot of software uses
> 4-byte alignment, perhaps based on this recommendation from the GCC manual:
Well, I think it's also the case that 4 byte alignment was baked into
the x86 ABI eons ago -- getting everyone to update has been tough.
We've certainly seen problems with other compilers leaving the stack in
a mis-aligned state.

> 
> '-mincoming-stack-boundary=NUM'
>      Assume the incoming stack is aligned to a 2 raised to NUM byte
>      boundary.  If '-mincoming-stack-boundary' is not specified, the one
>      specified by '-mpreferred-stack-boundary' is used.
> 
>      On Pentium and Pentium Pro, 'double' and 'long double' values
>      should be aligned to an 8-byte boundary (see '-malign-double') or
>      suffer significant run time performance penalties.  On Pentium III,
>      the Streaming SIMD Extension (SSE) data type '__m128' may not work
>      properly if it is not 16-byte aligned.
> …
>      This extra alignment does consume extra stack space, and generally
>      increases code size.  Code that is sensitive to stack space usage,
>      such as embedded systems and operating system kernels, may want to
>      reduce the preferred alignment to '-mpreferred-stack-boundary=2'.
I doubt many folks actively use this -- however, it only takes a couple
of mis-guided library developers using this option to create a world of
pain.


> 
> If we start compiling system libraries with SSE2 support enabled, we
> must make sure that they do not assume the stack is aligned by than 4
> bytes.  Would -mincoming-stack-boundary=2 do that?
Uros would know for sure.

> 
> Will GCC still maintain stack alignment if such code is called with a
> properly aligned stack?  (This is important so that callbacks can still
> use SSE2 with the default ABI.)
It's supposed to.  My concern would be that most of the time a
mis-aligned stack just works -- it's only when we see those key SSE2
instructions that it'll fault.  So bugs in this support could stay
latent for a long time.

jeff