Hi Fredrik, > > This would verify whether the original contents of $17 were a properly > > sign-extended 32-bit value. Although for predictable operation I would > > advise to use: > > > > sll k1, $17, 0 > > sw k1, PT_R17(sp) > > lw k1, PT_R17(sp) > > tne k1, $17, 12 > > > > or simply: > > > > sll k1, $17, 0 > > tne k1, $17, 12 > > sw $17, PT_R17(sp) > > There is a slight complication: the trap appears to be taken before the > console is ready, hence nothing is displayed. Is there a practical way > to postpone or recover from a trap? The issue becomes somewhat involved > since the trap needs to save/restore registers for itself to recover, > and so might evoke boundless recursion. You can use a static variable to hold a flag preventing the diagnostic check from failing more than once, avoiding recursion. Just check it here before doing actual verification and set it at the beginning of the Trap exception handler in arch/mips/kernel/genex.S. > From a practical point of view it would be great if backtraces could be > rate limited, recoverable and possible to copy over network (I don't have > e.g. a serial port soldered). I will look into other alternatives to try > to capture this. You can halt mid-way through `show_registers' to limit output if all you have is the virtual terminal and you have to copy information by hand. Later on in bootstrap you have the netconsole available; see Documentation/networking/netconsole.txt for details (I have never used that myself though). > > Previously you wrote that the problem is with resetting the upper 96 bits > > (how did you notice that BTW?) rather than bits 63:32 only, so you need a > > different check. > > I suspect 63:32 are the critical bits of the upper 96 bits since SD/LD > is sufficient. Summery of observations thus far: save/restore works with > SQ/LQ and SD/LD, but not SW/LW, in a 32-bit kernel ceteris paribus. This does look intriguing. > > Well, you do need to verify your patches for such a possibility, right. > > I would advise double-checking exception handling indeed, including > > run-time generated exception handler code in particular. > > The extremely early trap indicates a kernel issue, or perhaps register > garbage during kernel initialisation, that wouldn't be an error? Is the > run-time code related to genex.S? The R5900 patch sprinkles NOP and > SYNC.P instructions on it, for various workarounds, but not much else > apart from reverting db8466c581c "MIPS: IRQ Stack: Unwind IRQ stack onto > task stack" that otherwise crashes for an unknown reason. You cannot assume the firmware leaves properly sign-extended 32-bit values in registers upon the kernel entry. I advise truncating the contents of registers (with SLL by 0) at the beginning of `kernel_entry' in arch/mips/kernel/head.S for the purpose of avoiding spurious check triggers in the course of this debugging effort. HTH, Maciej