Re: Re: math broken on mips

Zhang Fuxin <fxzhang@ict.ac.cn> · Mon, 18 Feb 2002 17:51:22 +0800

hi,
    I find that you are asking only gcc related parts,that is related less,
here they are(from posts on linux-mips,i cc them to the list in case it is
useful,sorry if it brings you inconvenience):

1.about NaN handling
-----------------begin of the first problem--------------------------------
I am sorry but it seems i can't fix this without ugly changes.
since i am not familiar with gcc code, i decide to leave it to you,
but provide some information instead.

  In gcc there are 3 spaces where the NaN handling is assumed the 
Intel way.

   1. gcc/real.c (the most important one)
       here the author seems to have known the NaN pattern problem,so
     he leaves a interface macro for defining non intel NaN patterns:
     (comment of function "make_nan()",at about line 6219)

/* Output a binary NaN bit pattern in the target machine's format.  */

/* If special NaN bit patterns are required, define them in tm.h
   as arrays of unsigned 16-bit shorts.  Otherwise, use the default
   patterns here.  */

  I have read through this file and decided that the follow defined should
be enough for mips:

/* NaN pattern,mips QNAN & SNAN is different from intel's 
 * DFMODE_NAN and SFMODE_NAN is used in real.c */
#define DFMODE_NAN \
        unsigned short DFbignan[4] = {0x7ff7, 0xffff, 0xffff, 0xffff}; \
        unsigned short DFlittlenan[4] = {0xffff, 0xffff, 0xffff, 0xfff7}
#define SFMODE_NAN \
        unsigned short SFbignan[2] = {0x7fbf, 0xffff}; \
        unsigned short SFlittlenan[2] = {0xffff, 0xffbf}

   But the problem is where to put them:(. Obviously it is target specified
definitions and should be in config/mips/. Documents say tm.h is a symbol
link and included in config.h,but it is no long true.If i add them to xm-mips.h
then for native compilation it is ok but it fails for cross-compile.

   2.gcc/reg-stack.c (H.J. tell me it is not used on mips)
     There is a hardcoded QNaN used around line 477:
      nan = gen_lowpart (SFmode, GEN_INT (0x7fc00000));
     I sugest defining a macro QNAN_HAS_1ST_FRACBIT_CLEARED for mips and change
     it to,just don't know where to put it:
      #ifndef QNAN_HAS_1ST_FRACBIT_CLEARED
         nan = gen_lowpart (SFmode, GEN_INT (0x7fc00000));
      #else
         nan = gen_lowpart (SFmode, GEN_INT (0x7fbfffff));
      #endif

   3. config/fp-bit.c (H.J. said it is not mean to target specified)
      this is for machine having no fpu hardware.
      again i susgest define QNAN_HAS_1ST_FRACBIT_CLEARED and then apply this patch:

190d189
< #ifndef QNAN_HAS_1ST_FRACBIT_CLEARED
192,195d190
< #else
<         fraction &= ~QUIET_NAN;
< #endif
< 
379,380d373
< 
< #ifndef QNAN_HAS_1ST_FRACBIT_CLEARED
382,384d374
< #else
<         if (!(fraction & QUIET_NAN))
< #endif

�� 2002-02-03 22:54:00 you wrote��
>On Mon, Feb 04, 2002 at 02:22:48PM +0800, Zhang Fuxin wrote:
>> hi,
>> 
>> Gcc (2.96 20000731,H.J.LU's rh port for mips) think 0x7fc00000 is QNaN and 
>> optimize 0.0/0.0 as 0x7fc00000 for single precision ops,while for my cpu
>> (maybe most mips cpu) is a SNaN. R4k user's manual and "See Mips Run" both
>>  say so.And experiments confirm this.
>> 
>> Should we correct it?
>
>Yes. Do you have a patch?
>
>Thanks.
>
>
>H.J.
-------------------------end of first problem--------------------------------

--------------------------begin of the second problem------------------------

2. gcc -O2 optimization bug No.1
    It seems gcc with optimization > -O2 will produce incorrect code for fp code:
I find this example when tracking down the libm-test failures.
  ( expf(NaN) == NaN will report an extra "invalid operation" exception )
The faulting code snippet is(from glibc/sysdeps/ieee754/flt-32/w_expf.c):
	float __my_expf(float x)		/* wrapper expf */
{
	float z;
	unsigned int hx;
	z = __ieee754_expf(x);
	if(_LIB_VERSION == _IEEE_) return z;
	if(__finitef(x)) {
	    if(x>o_threshold)
		return 1.0;
	    else if(x<u_threshold)
		return 2.0;
            printf("haha\n");
	} 
	return z;
}

with -O2(or higher),some statements inside "if (__finitef(x))" are put before
testing the return value of __finitef(x),one of which is a c.lt.s that cause
the extra "invalid operation" exception being raised.

the obj codes is listed below:
1.  without -O
0000000000401290 <__my_expf>:
  401290:	3c1c0fc0 	lui	gp,0xfc0
  401294:	279c6dd0 	addiu	gp,gp,28112
  401298:	0399e021 	addu	gp,gp,t9
  40129c:	27bdffd0 	addiu	sp,sp,-48
  4012a0:	afbc0010 	sw	gp,16(sp)
  4012a4:	afbf0028 	sw	ra,40(sp)
  4012a8:	afbe0024 	sw	s8,36(sp)
  4012ac:	afbc0020 	sw	gp,32(sp)
  4012b0:	03a0f021 	move	s8,sp
  4012b4:	e7cc0030 	swc1	$f12,48(s8)
  4012b8:	c7cc0030 	lwc1	$f12,48(s8)
  4012bc:	8f998088 	lw	t9,-32632(gp)
  4012c0:	00000000 	nop
  4012c4:	0320f809 	jalr	t9
  4012c8:	00000000 	nop
  4012cc:	8fdc0010 	lw	gp,16(s8)
  4012d0:	e7c00018 	swc1	$f0,24(s8)
  4012d4:	8f8380ac 	lw	v1,-32596(gp)
  4012d8:	00000000 	nop
  4012dc:	8c630000 	lw	v1,0(v1)
  4012e0:	2402ffff 	li	v0,-1
  4012e4:	14620004 	bne	v1,v0,4012f8 <__my_expf+0x68>
  4012e8:	00000000 	nop
  4012ec:	c7c00018 	lwc1	$f0,24(s8)
  4012f0:	10000032 	b	4013bc <__my_expf+0x12c>
  4012f4:	00000000 	nop
  4012f8:	c7cc0030 	lwc1	$f12,48(s8)
  4012fc:	8f998090 	lw	t9,-32624(gp)
  401300:	00000000 	nop
  401304:	0320f809 	jalr	t9
  401308:	00000000 	nop
  40130c:	8fdc0010 	lw	gp,16(s8)
  401310:	10400029 	beqz	v0,4013b8 <__my_expf+0x128>
  401314:	00000000 	nop
  401318:	c7c20030 	lwc1	$f2,48(s8)
  40131c:	8f818018 	lw	at,-32744(gp)
  401320:	00000000 	nop
  401324:	242142a0 	addiu	at,at,17056
  401328:	c4200000 	lwc1	$f0,0(at)
  40132c:	00000000 	nop
  401330:	4602003c 	c.lt.s	$f0,$f2
  401334:	00000000 	nop
  401338:	45010003 	bc1t	401348 <__my_expf+0xb8>
  40133c:	00000000 	nop
  401340:	10000005 	b	401358 <__my_expf+0xc8>
  401344:	00000000 	nop
  401348:	3c013f80 	lui	at,0x3f80
  40134c:	44810000 	mtc1	at,$f0
  401350:	1000001a 	b	4013bc <__my_expf+0x12c>
  401354:	00000000 	nop
  401358:	c7c20030 	lwc1	$f2,48(s8)
  40135c:	8f818018 	lw	at,-32744(gp)
  401360:	00000000 	nop
  401364:	242142a4 	addiu	at,at,17060
  401368:	c4200000 	lwc1	$f0,0(at)
  40136c:	00000000 	nop
  401370:	4600103c 	c.lt.s	$f2,$f0
  401374:	00000000 	nop
  401378:	45010003 	bc1t	401388 <__my_expf+0xf8>
  40137c:	00000000 	nop
  401380:	10000005 	b	401398 <__my_expf+0x108>
  401384:	00000000 	nop
  401388:	3c014000 	lui	at,0x4000
  40138c:	44810000 	mtc1	at,$f0
  401390:	1000000a 	b	4013bc <__my_expf+0x12c>
  401394:	00000000 	nop
  401398:	8f848018 	lw	a0,-32744(gp)
  40139c:	00000000 	nop
  4013a0:	248442a8 	addiu	a0,a0,17064
  4013a4:	8f998058 	lw	t9,-32680(gp)
  4013a8:	00000000 	nop
  4013ac:	0320f809 	jalr	t9
  4013b0:	00000000 	nop
  4013b4:	8fdc0010 	lw	gp,16(s8)
  4013b8:	c7c00018 	lwc1	$f0,24(s8)
  4013bc:	03c0e821 	move	sp,s8
  4013c0:	8fbf0028 	lw	ra,40(sp)
  4013c4:	8fbe0024 	lw	s8,36(sp)
  4013c8:	03e00008 	jr	ra
  4013cc:	27bd0030 	addiu	sp,sp,48

2. with -O2
   0000000000400fc0 <__my_expf>:
  400fc0:	3c1c0fc0 	lui	gp,0xfc0
  400fc4:	279c70a0 	addiu	gp,gp,28832
  400fc8:	0399e021 	addu	gp,gp,t9
  400fcc:	27bdffd0 	addiu	sp,sp,-48
  400fd0:	afbc0010 	sw	gp,16(sp)
  400fd4:	e7b60028 	swc1	$f22,40(sp)
  400fd8:	e7b7002c 	swc1	$f23,44(sp)
  400fdc:	e7b40020 	swc1	$f20,32(sp)
  400fe0:	e7b50024 	swc1	$f21,36(sp)
  400fe4:	afbf001c 	sw	ra,28(sp)
  400fe8:	46006506 	mov.s	$f20,$f12
  400fec:	afbc0018 	sw	gp,24(sp)
  400ff0:	8f998088 	lw	t9,-32632(gp)
  400ff4:	00000000 	nop
  400ff8:	0320f809 	jalr	t9
  400ffc:	00000000 	nop
  401000:	8fbc0010 	lw	gp,16(sp)
  401004:	00000000 	nop
  401008:	8f8380ac 	lw	v1,-32596(gp)
  40100c:	00000000 	nop
  401010:	8c630000 	lw	v1,0(v1)
  401014:	2402ffff 	li	v0,-1
  401018:	46000586 	mov.s	$f22,$f0
  40101c:	10620021 	beq	v1,v0,4010a4 <__my_expf+0xe4>
  401020:	4600a306 	mov.s	$f12,$f20
  401024:	8f998090 	lw	t9,-32624(gp)
  401028:	00000000 	nop
  40102c:	0320f809 	jalr	t9
  401030:	00000000 	nop
  401034:	8fbc0010 	lw	gp,16(sp)
  401038:	3c0142b1 	lui	at,0x42b1
  40103c:	34217180 	ori	at,at,0x7180
  401040:	44811000 	mtc1	at,$f2
  401044:	3c013f80 	lui	at,0x3f80
  401048:	44810000 	mtc1	at,$f0
  40104c:	4614103c 	c.lt.s	$f2,$f20  // this is wrongly put ahead ?
  401050:	10400013 	beqz	v0,4010a0 <__my_expf+0xe0>
  401054:	00000000 	nop
  401058:	45010012 	bc1t	4010a4 <__my_expf+0xe4>
  40105c:	00000000 	nop
  401060:	3c01c2cf 	lui	at,0xc2cf
  401064:	3421f1b5 	ori	at,at,0xf1b5
  401068:	44810000 	mtc1	at,$f0
  40106c:	8f848018 	lw	a0,-32744(gp)
  401070:	00000000 	nop
  401074:	24843fa8 	addiu	a0,a0,16296
  401078:	4600a03c 	c.lt.s	$f20,$f0
  40107c:	3c014000 	lui	at,0x4000
  401080:	44810000 	mtc1	at,$f0
  401084:	45010007 	bc1t	4010a4 <__my_expf+0xe4>
  401088:	00000000 	nop
  40108c:	8f998058 	lw	t9,-32680(gp)
  401090:	00000000 	nop
  401094:	0320f809 	jalr	t9
  401098:	00000000 	nop
  40109c:	8fbc0010 	lw	gp,16(sp)
  4010a0:	4600b006 	mov.s	$f0,$f22
  4010a4:	8fbf001c 	lw	ra,28(sp)
  4010a8:	c7b60028 	lwc1	$f22,40(sp)
  4010ac:	c7b7002c 	lwc1	$f23,44(sp)
  4010b0:	c7b40020 	lwc1	$f20,32(sp)
  4010b4:	c7b50024 	lwc1	$f21,36(sp)
  4010b8:	03e00008 	jr	ra
  4010bc:	27bd0030 	addiu	sp,sp,48

----------------------------end of the second problem------------------------

----------------------------begin of 3rd problem-----------------------------
3.wrong code from gcc -O2,No.2
hi,
    it seems fixed in latest gcc (3.1). gcc 2.96 does produce wrong code
 for -O2, i posted another example one or two days ago.

�� 2002-02-08 19:37:00 you wrote��
>I found gcc 2.96 (gcc-2.96-99.1.mipsel.rpm in H.J.Lu's RedHat 7.1)
>outputs wrong code for this short program.
>
>int foo(unsigned long long a, unsigned long long b)
>{
>	int as, bs;
>	as = a >> 63;
>	bs = b >> 63;
>	if (as != bs)
>		return as || !(b << 1);
>	return (a == b) || as;
>}
>
>int main(int argc, char **argv)
>{
>	return foo(0, 0xffffffffffffffffull);
>}
>
>This program must return 0.  But compiling with -O2 it returns 1 !!
>
># gcc -O -o foo foo.c;./foo;echo $?
>0
># gcc -O2 -o foo foo.c;./foo;echo $?
>1
>
>Output wth -O -g are:
>
>00400780 <foo>:
>  400780:	3c1c0fc0 	lui	gp,0xfc0
>  400784:	279c78b0 	addiu	gp,gp,30896
>  400788:	0399e021 	addu	gp,gp,t9
>  40078c:	00804021 	move	t0,a0
>  400790:	00a04821 	move	t1,a1
>  400794:	000917c2 	srl	v0,t1,0x1f
>  400798:	00402821 	move	a1,v0
>  40079c:	000717c2 	srl	v0,a3,0x1f
>  4007a0:	10a2000d 	beq	a1,v0,4007d8 <foo+0x58>
>  4007a4:	00001821 	move	v1,zero
>  4007a8:	14a00008 	bnez	a1,4007cc <foo+0x4c>
>  4007ac:	00004021 	move	t0,zero
>  4007b0:	00071840 	sll	v1,a3,0x1
>  4007b4:	000627c2 	srl	a0,a2,0x1f
>  4007b8:	00641825 	or	v1,v1,a0
>  4007bc:	00061040 	sll	v0,a2,0x1
>  4007c0:	00621025 	or	v0,v1,v0
>  4007c4:	14400002 	bnez	v0,4007d0 <foo+0x50>
>  4007c8:	00000000 	nop
>  4007cc:	24080001 	li	t0,1
>  4007d0:	03e00008 	jr	ra
>  4007d4:	01001021 	move	v0,t0
>  4007d8:	15060003 	bne	t0,a2,4007e8 <foo+0x68>
>  4007dc:	00001021 	move	v0,zero
>  4007e0:	11270003 	beq	t1,a3,4007f0 <foo+0x70>
>  4007e4:	00000000 	nop
>  4007e8:	10a00002 	beqz	a1,4007f4 <foo+0x74>
>  4007ec:	00000000 	nop
>  4007f0:	24020001 	li	v0,1
>  4007f4:	03e00008 	jr	ra
>  4007f8:	00000000 	nop
>
>Output with -O2 -g are:
>
>00400780 <foo>:
>  400780:	3c1c0fc0 	lui	gp,0xfc0
>  400784:	279c78b0 	addiu	gp,gp,30896
>  400788:	0399e021 	addu	gp,gp,t9
>  40078c:	00805021 	move	t2,a0
>  400790:	00c04021 	move	t0,a2
>  400794:	00a05821 	move	t3,a1
>  400798:	00e04821 	move	t1,a3
>  40079c:	000b17c2 	srl	v0,t3,0x1f
>  4007a0:	000927c2 	srl	a0,t1,0x1f
>  4007a4:	00001821 	move	v1,zero
>  4007a8:	1044000a 	beq	v0,a0,4007d4 <foo+0x54>
>  4007ac:	00002821 	move	a1,zero
>  4007b0:	14400006 	bnez	v0,4007cc <foo+0x4c>
>  4007b4:	24050001 	li	a1,1
>  4007b8:	00091840 	sll	v1,t1,0x1
>  4007bc:	000827c2 	srl	a0,t0,0x1f
>  4007c0:	00641825 	or	v1,v1,a0
>  4007c4:	00081040 	sll	v0,t0,0x1
>  4007c8:	00621025 	or	v0,v1,v0
>  4007cc:	03e00008 	jr	ra
>  4007d0:	00a01021 	move	v0,a1
>  4007d4:	15480003 	bne	t2,t0,4007e4 <foo+0x64>
>  4007d8:	00001821 	move	v1,zero
>  4007dc:	11690003 	beq	t3,t1,4007ec <foo+0x6c>
>  4007e0:	00000000 	nop
>  4007e4:	10400002 	beqz	v0,4007f0 <foo+0x70>
>  4007e8:	00000000 	nop
>  4007ec:	24030001 	li	v1,1
>  4007f0:	03e00008 	jr	ra
>  4007f4:	00601021 	move	v0,v1
>
>
>It seems one 'bnez' in good code (at 4007c4) was lost in bad code.
>
>Is this a know problem?  If so, is there any patches available?
>
>Thank you.
>---
>Atsushi Nemoto 
-----------------------------end of 3rd problem------------------------

�� 2002-02-18 09:16:00 you wrote��
>Hi Zhang,
>
>we're talking to Algorithmics about the possibility of productizing their
>SDE compiler for Linux. If this materializes, we should be able to get the
>GCC issues you mention fixed.
>
>Could you therefore pls. send me examples showing all the GCC issues you
>mention:
>
>1) SNan & QNan handling wrong
>2) Wrong code generated with -O2 (exception handling problem)
>3) Wrong code generated with -O2 (long long type problem)
>
>I would like (small!) examples suitable for inclusion directly in a work 
>specification, so items like "Mozilla doesn't run" is not good enough :-)
>
>Finally, Kjeld E. at MIPS is spending some time on math-emu. So if you have
>specific issues, you can try to mail him as well (kjelde@mips.com).
>
>Problem #4 you report could be either in glibc or math-emu. If it's math-emu
>we'll fix it.
>
>/Hartvig
>
>
>Zhang Fuxin writes:
>> 
>> hi,
>>    There are so many problems on math handling for linux-mips,including:
>> 1. SNaN & QNan handling(both gcc & glibc)
>> 2. gcc2.96 generates wrong code with -O2,at least 
>>      one will lead to exception handling problem(reported by me)
>>      one will lead to some 'long long' type mishandling(reported by Atsushi Nemoto)
>> 
>>    (gcc3.1 seems a lot better,but it has problem to compile glibc.I can't even compile
>>  current glibc cvs code(with dl-conflict.c etc patched) with it. The best result is
>>  a segment fault when using ld.so.1:
>>       ../elf/ld.so.1 --library-path ..:../math:../elf:../dlfcn:../nss:../nis:../rt:../resolv:
>>   ../crypt:../linuxthreads ./rpcgen -Y ../scripts -c rpcsvc/bootparam_prot.x -o 
>>   xbootparam_prot.T
>>   make[1]: *** [xbootparam_prot.stmp] Segmentation fault
>>   )     
>> 3. problems with math-emu
>> 4. other problems to be investigated for its cause,including this one,
>>   
>>        pow(2,7) = 128.0 when rounding = TONEAREST or UPWARD
>>                 = 64.1547.. when rounding = DOWNWARD or TOWARDZERO
>> 
>>  when today i find out the above problem I was feeling almost despaired:(
>> 
>>  I want to fix these problems,if i could.But it concerns so many things that i am not
>>  expert on and no time to dig:(. So any help will be highly appreciated.
>> 
>>  
>> 
>> 
>> Regards
>>             Zhang Fuxin
>>             fxzhang@ict.ac.cn

Regards
            Zhang Fuxin
            fxzhang@ict.ac.cn