RE: do_ri exception in Linux (MIPS 4kec)

"Nori, Soma Sekhar" <nsekhar@xxxxxx> · Tue, 28 Dec 2004 20:11:16 +0530

Ralf,

Thanks for the input.

To understand what exceptions I was getting (apart from RI), I implemented a counter for each exception.
In "except_vec3_generic" (entry.S), I included some code to increment a counter for each exception received.

<code>
	NESTED(except_vec3_generic, 0, sp)

#if defined(CONFIG_CPU_R5432)
	/* [jsun] work around a nasty bug in R5432  */
	mfc0	k0, CP0_INDEX
#endif 
	mfc0	k1, CP0_CAUSE
      la    k0, exception_counter
	andi	k1, k1, 0x7c                
	addu	k0, k0, k1
      lw    k1, (k0)
      addi  k1, k1, 1
      sw    k1, (k0)

	... (original code follows)

</code>

exception_counter is an array of 32 integers. On printing out the array in do_ri exception handler, I found that only TLB Mod(Code 1), TLBL (Code 2), TLBS (Code 3), syscall (code 8) and RI (code 10) exceptions were received (had count >= 1). With this, will it be safe to assume that RI is the only unwanted exception?

To get hold of exact EPC at which RI is occuring, I tried to clear the EXL bit of status register by adding some more code above the exception counting code in the except_vec3_generic routine.

<code>
	NESTED(except_vec3_generic, 0, sp)

#if defined(CONFIG_CPU_R5432)
	/* [jsun] work around a nasty bug in R5432  */
	mfc0	k0, CP0_INDEX
#endif 
      mfc0    k0, CP0_STATUS
      nop
      ori     k0, k0, 0x2
      xori    k0, k0, 0x2
      mtc0    k0, CP0_STATUS
      nop

	... (Exception counting code follows)

</code>

Surprisingly, the processor does not seem to alow me to clear the EXL bit. I get AdEL (code 4) exception as I complete the "mtc0    k0, CP0_STATUS" instruction. The processor goes into an infinite loop of exceptions and boot-up hangs after printing "Freeing unused kernel memory: 48k freed". Is it not possible for software to clear the EXL bit after it has been set by the hardware? If not, what else can I do to get hold of the correct EPC value where RI is occuring?

Thanks,
Sekhar

-----Original Message-----
From: Ralf Baechle [mailto:ralf@xxxxxxxxxxxxxx]
Sent: Monday, December 27, 2004 6:00 PM
To: Nori, Soma Sekhar
Cc: linux-mips@xxxxxxxxxxxxxx; Iyer, Suraj
Subject: Re: do_ri exception in Linux (MIPS 4kec)

On Thu, Dec 23, 2004 at 04:58:03PM +0530, Nori, Soma Sekhar wrote:

> We are using montavista Linux version 2.4.17, gcc version 2.95.3 running on MIPS 4kec.
> 
> Here is the dump:
> $0 : 00000000 0044def4 000001ac 0000006b 00000000 7fff7c08 00000001 00000000
> $8 : 0000fc00 00000001 00000000 941524d0 00004700 00000000 97fc3ea0 7fff7c08
> $16: 100048a4 100029d8 100029d8 10003020 00000000 7fff7dc8 10003b60 2d8e2163
> $24: 00000001 2ab7bc30                   10008e70 7fff7bf0 04000000 00439e50
> Hi : 00000000
> Lo : 00000001
> epc  : 00439e84    Not tainted
> Status: 0000fc13
> Cause : 10800028
> Process sh (pid: 18, stackpage=97fc2000)
> Stack: 00000001 00000000 2abd0ff0 7fff7c28 10008e70 00000000 10008e6c 00000000
>        100049a0 0042f188 00000000 100029d8 00000001 00000001 7fff7f04 10008e70
>        00427fe4 00427f00 00000000 00000000 10002764 10008e70 10008e70 00000000
>        00000000 00000000 10008e70 00422734 00000001 00000001 7fff7f04 10008e70
>        10008e70 00000003 10008e70 004315cc 00000001 00000000 10002764 00000000
>        10008e70 ...
> Call Trace:
> Code: 00000000  2421dd48  00220821 <8c220000> 00000000  005c1021  00400008  0000
> 0000  8f99802c
> 
> The epc is not in kernel space and ksymoops did not provide any info. The epc keeps changing to different locations in user space over multiple runs.

In a case like this you're likely dealing with double exceptions.  Your
code is taking an exception and the exception handler while running with
c0_status set is taking another exception.  If the first exception handler
is still running with the c0_status.exl bit set the CPU when taking the
second exception it will not record the PC of the second exception and
you will have a seemingly unexplainable exception.

A few processors have the nasty habit of throwing RI receptions or do
similarly weird things when executing code that is mapped through multiple
TLB pages but the 4kEC shouldn't.

  Ralf