Re: Troubles with TLB refills

Ralf Baechle <ralf@oss.sgi.com> · Mon, 5 Mar 2001 11:49:27 +0100

On Mon, Mar 05, 2001 at 01:40:01PM +1000, Liam Davies wrote:

> I am trying to get the 2.4 kernel up and running on the Cobalt boxes.
> At the moment I am trying to get the initial transition from kernel
> to user mode working.
> 
> The elf loader is trying to put stuff in the stack for the new user
> process and *each* call to NEW_AUX_ENTRY is generating a page fault
> that cannot be resolved. The address generated by the BadVAddr on the
> TLBS exception is not correct. Also, I never receive a TLBrefill
> exception on the accesses. It is using the except_vec0_nevada handler.

The fault address 0x10004f4c looks wrong; NEW_AUX_ENTRY should access
something near the top of stack, that is a value a few bytes below
0x80000000.  I wonder if this behaviour is related to this CPU bug -
one which btw. was never acknowledged by QED but identified independantly
by sever Cobalt people back at the time.

[head.S]
	/* TLB refill, EXL == 0, R52x0 "Nevada" version */
        /*
         * This version has a bug workaround for the Nevada.  It seems
         * as if under certain circumstances the move from cp0_context
         * might produce a bogus result when the mfc0 instruction and
         * it's consumer are in a different cacheline or a load instruction,
         * probably any memory reference, is between them.  This is
         * potencially slower than the R4000 version, so we use this
         * special version.
         */
	.set	noreorder
	.set	noat
	LEAF(except_vec0_nevada)
	.set	mips3
	mfc0	k0, CP0_BADVADDR		# Get faulting address
	srl	k0, k0, 22			# get pgd only bits
	lw	k1, current_pgd			# get pgd pointer
	sll	k0, k0, 2
	addu	k1, k1, k0			# add in pgd offset
	lw	k1, (k1)
	mfc0	k0, CP0_CONTEXT			# get context reg
	srl	k0, k0, 1			# get pte offset
	and	k0, k0, 0xff8
	addu	k1, k1, k0			# add in offset
	lw	k0, 0(k1)			# get even pte
	lw	k1, 4(k1)			# get odd pte
	srl	k0, k0, 6			# convert to entrylo0
	mtc0	k0, CP0_ENTRYLO0		# load it
	srl	k1, k1, 6			# convert to entrylo1
	mtc0	k1, CP0_ENTRYLO1		# load it
	nop					# QED specified nops
	nop
	tlbwr					# write random tlb entry
	nop					# traditional nop
	eret					# return from trap
	END(except_vec0_nevada)
[...]

This exception handler has been modified since the version tested in the
Cobalt Qube and I'm not sure if the bug workaround actually got tested
since then.

(Bonus points to whoever writes a good probe for this bug!)

handle_page_fault got called and printed something; therefore the
exception handler cannot possibly have been trashed.  do_page_fault gets
called by via the generic exception handler.  The TLB vectors there are only
taken if there is a TLB entry matching the address in the TLB.  Therefore
your theory about no tlb refill exception cannot be right.  The TLB
dump only displays entries where at least on of the entry0 / entry1
entries is valid, therefore you get an empty dump; maybe that made you
believe you didn't get a TLB reload exception.

  Ralf