Re: [PATCH] powerpc/64: Always build with 128-bit long double

Segher Boessenkool <segher@xxxxxxxxxxxxxxxxxxx> · Wed, 5 Apr 2023 06:04:31 -0500

Hi!

On Wed, Apr 05, 2023 at 03:32:21PM +1000, Michael Ellerman wrote:
> Segher Boessenkool <segher@xxxxxxxxxxxxxxxxxxx> writes:
> > On Tue, Apr 04, 2023 at 08:28:47PM +1000, Michael Ellerman wrote:
> >> The amdgpu driver builds some of its code with hard-float enabled,
> >> whereas the rest of the kernel is built with soft-float.
> >> 
> >> When building with 64-bit long double, if soft-float and hard-float
> >> objects are linked together, the build fails due to incompatible ABI
> >> tags.
> >
> >> Currently those build errors are avoided because the amdgpu driver is
> >> gated on 128-bit long double being enabled. But that's not a detail the
> >> amdgpu driver should need to be aware of, and if another driver starts
> >> using hard-float the same problem would occur.
> >
> > Well.  The kernel driver either has no business using long double (or
> > any other floating point even) at all, or it should know exactly what is
> > used: double precision, double-double, or quadruple precision.  Both of
> > the latter two are 128 bits.
> 
> In a perfect world ... :)

Well, without it knowing what exactly it calculates, does this code have
any business running in kernel space?  Is it acceptable to just do
random things in the kernel?  I don't know the kernel code that uses
long double at all (and I'm afraid to look for fear of going blind), but
all this sounds like the 64-bit IEEE double precision floating point is
not good enough for some certain calculation, but 80-bit extended double
precision as used on x86 is.  That does make it likely that both of our
128-bit formats would work, but there are lots and lots of "buts".  To
start with, what does that code require wrt fp contraction (so, floating
multiply-add)?

All of this suggests that there should not be floating point code here
*at all*, it is harder to use it in any acceptable way than to just do
things in fixed point or scaled integer or whatever.

> >> All versions of the 64-bit ABI specify that long-double is 128-bits.
> >> However some compilers, notably the kernel.org ones, are built to use
> >> 64-bit long double by default.
> >
> > Mea culpa, I suppose?  But buildall doesn't force 64 bit explicitly.
> > I wonder how this happened?  Is it maybe a problem in the powerpc64le
> > config in GCC itself?
> 
> Not blaming anyone, just one of those things that happens.

Oh I didn't say anyone is blaming me.  I want to fix the problem, that
is all :-)

> The
> toolchains the distros (Ubuntu/Fedora) build all seem to use 128, but
> possibly that's because someone told them to configure them that way at
> some point.

No, or yes, depending on how you look at it?  Default configurations all
have 128-bit long double.  But buildall uses (almost) the same
configuration on all targets, namely:

        $GCC_SRC/configure \
        --target=$TARGET --enable-targets=all --prefix=$PREFIX \
        --enable-languages=c --without-headers --disable-bootstrap \
        --disable-nls --disable-threads --disable-shared \
        --disable-libmudflap --disable-libssp --disable-libgomp \
        --disable-decimal-float --disable-libquadmath \
        --disable-libatomic --disable-libcc1 --disable-libmpx

All of this is perfectly reasonable imnsho, but I guess the
--enable-targets=all causes the problem here?  That makes no sense, but
it is still my best guess.

> > I have a patch from summer last year (Arnd's
> > toolchains are built without it) that does
> > +       powerpc64le-*)  TARGET_GCC_CONF=--with-long-double-128
> > Unfortunately I don't remember why I did that, and I never investigated
> > what the deeper problem is :-/
> 
> Last summer (aka winter)

Oh right.  Last July :-)

> is when we first discovered this issue with the
> long double size being implicated.
> 
> See:
>   https://git.kernel.org/torvalds/c/c653c591789b3acfa4bf6ae45d5af4f330e50a91
> 
> So I guess that's what prompted your patch?

It was one day before my patch, maybe less than 12h even, so that could
be.  I don't update the kernel source automatically though (there are
50 to 100 build breaks every year, when things are in decent state I
tend to keep it for a while).  But it may have been our patches are due
to the same cause, and mine is no longer needed?  That would be nice.  I
never committed that patch (or there would be more context, sigh).

I'll dig, there is a real problem in the compiler it seems.  Thanks for
the help so far!

Segher