Re: question about equivalent x87/x64-64 fpu code...

Pawel Sikora <pluto@xxxxxxxx> · Mon, 16 May 2011 12:45:55 +0200

On Monday 16 of May 2011 11:15:29 Andrew Haley wrote:
> On 13/05/11 19:11, PaweÅ Sikora wrote:
> > Hi,
> > 
> > i'm using a 3rd-party engine http://glaros.dtc.umn.edu/gkhome/metis/metis/overview
> > for partitioning some complex data. it worked fine for years until today (may 13)...
> > 
> > observations:
> > - the 32-bit metis build produces nice and balanced partitons.
> > - the 64-bit metis build produces bad and unbalanced partitons.
> > 
> > the metis' engine uses arrays of integers on the public interface and internally
> > some float-based and unsafe in terms of precison (x<y and x==y) operations.
> > 
> > so, i've built/tested following metis variants:
> > 
> > 1). -m32 -march=pentium4 -O1                         - works fine.
> > 2). -m32 -march=pentium4 -O1 -mfpmath=sse            - works fine.
> > 3). -m64 -march=x86-64 -O1                           - bad/unbalanced partitions.
> > 4). -m64 -march=x86-64 -O1 -mfpmath=387              - bad/unbalanced partitions.
> > 
> > at this point i've expected wrong results (< 80-bit precision) from variants 2/3
> > and good results from variants 1/4 but the real world differs.
> > 
> > next, i've isolated a one place in sources with float x<y stmt and changed it
> > to (x-y)<0.00001. with such change both native 1/3 variants give nice/equivalent results.
> > 
> > so, where is the problem? is the variants 1/4 really equivalent?
> 
> It's going to be very hard for gcc specialists to answer this.  You really
> need a numerical analyst who is familiar with the code to have a look.
>
> This may be a gcc bug, or it may be a bug in the code.  It'd impossible
> to know without doing more digging into the problem.

Hi,

i've naturally reported these numerical problems to the author at first place
but i'm still impressed that code produced by gcc for x87/x86-64 with explicit
and equal -mpc32/-mfpmath options gives different results.

testcase compiled for 32/64-bit with SSE math and fpu precision forced
to 32-bit gives the same (bad) results:

$ LANG=C make METIS_VER=4.0.1 EXTRA_CFLAGS="-march=core2 -mpc32" EXTRA_CFLAGS32="-mfpmath=sse" EXTRA_CFLAGS64=""
compiling 32-bit metis-4.0.1 testcase...
gcc -m32 -O1 -Imetis-4.0.1 metis-4.0.1/*.c test.c -c -march=core2 -mpc32 -mfpmath=sse
gcc -m32 -lm *.o -o test32.m4.0.1 && rm *.o
./test32.m4.0.1 && mv test{,32.m4.0.1}.out
partition 0: lut+dram:  216506, flip-flop:   56961, bram:  141955
partition 1: lut+dram:   86815, flip-flop:   36485, bram:  143550
partition 2: lut+dram:  142807, flip-flop:   49038, bram:  151525
partition 3: lut+dram:  142517, flip-flop:   48918, bram:  149930
compiling 64-bit metis-4.0.1 testcase...
gcc -m64 -O1 -Imetis-4.0.1 metis-4.0.1/*.c test.c -c -march=core2 -mpc32
gcc -m64 -lm *.o -o test64.m4.0.1 && rm *.o
./test64.m4.0.1 && mv test{,64.m4.0.1}.out
partition 0: lut+dram:  216506, flip-flop:   56961, bram:  141955
partition 1: lut+dram:   86815, flip-flop:   36485, bram:  143550
partition 2: lut+dram:  142807, flip-flop:   49038, bram:  151525
partition 3: lut+dram:  142517, flip-flop:   48918, bram:  149930
diff -u test32.m4.0.1.out test64.m4.0.1.out >test.m4.0.1.out.diff

similiar variant with math forced to x87 behaves differently:

$ LANG=C make METIS_VER=4.0.1 EXTRA_CFLAGS="-march=core2 -mpc32" EXTRA_CFLAGS32="" EXTRA_CFLAGS64="-mfpmath=387"
compiling 32-bit metis-4.0.1 testcase...
gcc -m32 -O1 -Imetis-4.0.1 metis-4.0.1/*.c test.c -c -march=core2 -mpc32
gcc -m32 -lm *.o -o test32.m4.0.1 && rm *.o
./test32.m4.0.1 && mv test{,32.m4.0.1}.out
partition 0: lut+dram:  150173, flip-flop:   46357, bram:  141955
partition 1: lut+dram:  153148, flip-flop:   47089, bram:  143550
partition 2: lut+dram:  141322, flip-flop:   49043, bram:  151525
partition 3: lut+dram:  144002, flip-flop:   48913, bram:  149930
compiling 64-bit metis-4.0.1 testcase...
gcc -m64 -O1 -Imetis-4.0.1 metis-4.0.1/*.c test.c -c -march=core2 -mpc32 -mfpmath=387
gcc -m64 -lm *.o -o test64.m4.0.1 && rm *.o
./test64.m4.0.1 && mv test{,64.m4.0.1}.out
partition 0: lut+dram:  216506, flip-flop:   56961, bram:  141955
partition 1: lut+dram:   86815, flip-flop:   36485, bram:  143550
partition 2: lut+dram:  142807, flip-flop:   49038, bram:  151525
partition 3: lut+dram:  142517, flip-flop:   48918, bram:  149930
diff -u test32.m4.0.1.out test64.m4.0.1.out >test.m4.0.1.out.diff
make: *** [all] Error 1

but.... adding -fexcess-precision=standard to 32-bit testcase gives me again bad but equal results.

$ LANG=C make METIS_VER=4.0.1 EXTRA_CFLAGS="-march=core2 -mpc32" EXTRA_CFLAGS32="-fexcess-precision=standard" EXTRA_CFLAGS64="-mfpmath=387"
compiling 32-bit metis-4.0.1 testcase...
gcc -m32 -O1 -Imetis-4.0.1 metis-4.0.1/*.c test.c -c -march=core2 -mpc32 -fexcess-precision=standard
gcc -m32 -lm *.o -o test32.m4.0.1 && rm *.o
./test32.m4.0.1 && mv test{,32.m4.0.1}.out
partition 0: lut+dram:  216506, flip-flop:   56961, bram:  141955
partition 1: lut+dram:   86815, flip-flop:   36485, bram:  143550
partition 2: lut+dram:  142807, flip-flop:   49038, bram:  151525
partition 3: lut+dram:  142517, flip-flop:   48918, bram:  149930
compiling 64-bit metis-4.0.1 testcase...
gcc -m64 -O1 -Imetis-4.0.1 metis-4.0.1/*.c test.c -c -march=core2 -mpc32 -mfpmath=387
gcc -m64 -lm *.o -o test64.m4.0.1 && rm *.o
./test64.m4.0.1 && mv test{,64.m4.0.1}.out
partition 0: lut+dram:  216506, flip-flop:   56961, bram:  141955
partition 1: lut+dram:   86815, flip-flop:   36485, bram:  143550
partition 2: lut+dram:  142807, flip-flop:   49038, bram:  151525
partition 3: lut+dram:  142517, flip-flop:   48918, bram:  149930
diff -u test32.m4.0.1.out test64.m4.0.1.out >test.m4.0.1.out.diff

should -mpc32 and equal fpmath model produce equal results (no matter good or bad) ?
or mabye there's a bug in gcc exposed by explicit -fexcess-precision option?

shoud i report this as potential gcc bug?

BR,
PaweÅ.