On Monday 16 of May 2011 11:15:29 Andrew Haley wrote: > On 13/05/11 19:11, PaweÅ Sikora wrote: > > Hi, > > > > i'm using a 3rd-party engine http://glaros.dtc.umn.edu/gkhome/metis/metis/overview > > for partitioning some complex data. it worked fine for years until today (may 13)... > > > > observations: > > - the 32-bit metis build produces nice and balanced partitons. > > - the 64-bit metis build produces bad and unbalanced partitons. > > > > the metis' engine uses arrays of integers on the public interface and internally > > some float-based and unsafe in terms of precison (x<y and x==y) operations. > > > > so, i've built/tested following metis variants: > > > > 1). -m32 -march=pentium4 -O1 - works fine. > > 2). -m32 -march=pentium4 -O1 -mfpmath=sse - works fine. > > 3). -m64 -march=x86-64 -O1 - bad/unbalanced partitions. > > 4). -m64 -march=x86-64 -O1 -mfpmath=387 - bad/unbalanced partitions. > > > > at this point i've expected wrong results (< 80-bit precision) from variants 2/3 > > and good results from variants 1/4 but the real world differs. > > > > next, i've isolated a one place in sources with float x<y stmt and changed it > > to (x-y)<0.00001. with such change both native 1/3 variants give nice/equivalent results. > > > > so, where is the problem? is the variants 1/4 really equivalent? > > It's going to be very hard for gcc specialists to answer this. You really > need a numerical analyst who is familiar with the code to have a look. > > This may be a gcc bug, or it may be a bug in the code. It'd impossible > to know without doing more digging into the problem. Hi, i've naturally reported these numerical problems to the author at first place but i'm still impressed that code produced by gcc for x87/x86-64 with explicit and equal -mpc32/-mfpmath options gives different results. testcase compiled for 32/64-bit with SSE math and fpu precision forced to 32-bit gives the same (bad) results: $ LANG=C make METIS_VER=4.0.1 EXTRA_CFLAGS="-march=core2 -mpc32" EXTRA_CFLAGS32="-mfpmath=sse" EXTRA_CFLAGS64="" compiling 32-bit metis-4.0.1 testcase... gcc -m32 -O1 -Imetis-4.0.1 metis-4.0.1/*.c test.c -c -march=core2 -mpc32 -mfpmath=sse gcc -m32 -lm *.o -o test32.m4.0.1 && rm *.o ./test32.m4.0.1 && mv test{,32.m4.0.1}.out partition 0: lut+dram: 216506, flip-flop: 56961, bram: 141955 partition 1: lut+dram: 86815, flip-flop: 36485, bram: 143550 partition 2: lut+dram: 142807, flip-flop: 49038, bram: 151525 partition 3: lut+dram: 142517, flip-flop: 48918, bram: 149930 compiling 64-bit metis-4.0.1 testcase... gcc -m64 -O1 -Imetis-4.0.1 metis-4.0.1/*.c test.c -c -march=core2 -mpc32 gcc -m64 -lm *.o -o test64.m4.0.1 && rm *.o ./test64.m4.0.1 && mv test{,64.m4.0.1}.out partition 0: lut+dram: 216506, flip-flop: 56961, bram: 141955 partition 1: lut+dram: 86815, flip-flop: 36485, bram: 143550 partition 2: lut+dram: 142807, flip-flop: 49038, bram: 151525 partition 3: lut+dram: 142517, flip-flop: 48918, bram: 149930 diff -u test32.m4.0.1.out test64.m4.0.1.out >test.m4.0.1.out.diff similiar variant with math forced to x87 behaves differently: $ LANG=C make METIS_VER=4.0.1 EXTRA_CFLAGS="-march=core2 -mpc32" EXTRA_CFLAGS32="" EXTRA_CFLAGS64="-mfpmath=387" compiling 32-bit metis-4.0.1 testcase... gcc -m32 -O1 -Imetis-4.0.1 metis-4.0.1/*.c test.c -c -march=core2 -mpc32 gcc -m32 -lm *.o -o test32.m4.0.1 && rm *.o ./test32.m4.0.1 && mv test{,32.m4.0.1}.out partition 0: lut+dram: 150173, flip-flop: 46357, bram: 141955 partition 1: lut+dram: 153148, flip-flop: 47089, bram: 143550 partition 2: lut+dram: 141322, flip-flop: 49043, bram: 151525 partition 3: lut+dram: 144002, flip-flop: 48913, bram: 149930 compiling 64-bit metis-4.0.1 testcase... gcc -m64 -O1 -Imetis-4.0.1 metis-4.0.1/*.c test.c -c -march=core2 -mpc32 -mfpmath=387 gcc -m64 -lm *.o -o test64.m4.0.1 && rm *.o ./test64.m4.0.1 && mv test{,64.m4.0.1}.out partition 0: lut+dram: 216506, flip-flop: 56961, bram: 141955 partition 1: lut+dram: 86815, flip-flop: 36485, bram: 143550 partition 2: lut+dram: 142807, flip-flop: 49038, bram: 151525 partition 3: lut+dram: 142517, flip-flop: 48918, bram: 149930 diff -u test32.m4.0.1.out test64.m4.0.1.out >test.m4.0.1.out.diff make: *** [all] Error 1 but.... adding -fexcess-precision=standard to 32-bit testcase gives me again bad but equal results. $ LANG=C make METIS_VER=4.0.1 EXTRA_CFLAGS="-march=core2 -mpc32" EXTRA_CFLAGS32="-fexcess-precision=standard" EXTRA_CFLAGS64="-mfpmath=387" compiling 32-bit metis-4.0.1 testcase... gcc -m32 -O1 -Imetis-4.0.1 metis-4.0.1/*.c test.c -c -march=core2 -mpc32 -fexcess-precision=standard gcc -m32 -lm *.o -o test32.m4.0.1 && rm *.o ./test32.m4.0.1 && mv test{,32.m4.0.1}.out partition 0: lut+dram: 216506, flip-flop: 56961, bram: 141955 partition 1: lut+dram: 86815, flip-flop: 36485, bram: 143550 partition 2: lut+dram: 142807, flip-flop: 49038, bram: 151525 partition 3: lut+dram: 142517, flip-flop: 48918, bram: 149930 compiling 64-bit metis-4.0.1 testcase... gcc -m64 -O1 -Imetis-4.0.1 metis-4.0.1/*.c test.c -c -march=core2 -mpc32 -mfpmath=387 gcc -m64 -lm *.o -o test64.m4.0.1 && rm *.o ./test64.m4.0.1 && mv test{,64.m4.0.1}.out partition 0: lut+dram: 216506, flip-flop: 56961, bram: 141955 partition 1: lut+dram: 86815, flip-flop: 36485, bram: 143550 partition 2: lut+dram: 142807, flip-flop: 49038, bram: 151525 partition 3: lut+dram: 142517, flip-flop: 48918, bram: 149930 diff -u test32.m4.0.1.out test64.m4.0.1.out >test.m4.0.1.out.diff should -mpc32 and equal fpmath model produce equal results (no matter good or bad) ? or mabye there's a bug in gcc exposed by explicit -fexcess-precision option? shoud i report this as potential gcc bug? BR, PaweÅ.