Re: kernel BUG at arch/sparc/mm/fault_64.c:270!

Mikulas Patocka <mpatocka@xxxxxxxxxx> · Sat, 7 Nov 2009 00:36:23 -0500 (EST)

On Thu, 5 Nov 2009, David Miller wrote:

> From: Mikulas Patocka <mpatocka@xxxxxxxxxx>
> Date: Wed, 21 Oct 2009 14:24:59 -0400 (EDT)
> 
> > The fault_code variable that triggered is in l2, it's 0xfe, the fault 
> > address is in l3. Do you have any idea how this could (or couldn't) 
> > happen?
> 
> The fault_code on sparc64 is a bitmask which should contain only the
> following bit values (some of which are exclusive):
> 
> #define FAULT_CODE_WRITE	0x01	/* Write access, implies D-TLB	   */
> #define FAULT_CODE_DTLB		0x02	/* Miss happened in D-TLB	   */
> #define FAULT_CODE_ITLB		0x04	/* Miss happened in I-TLB	   */
> #define FAULT_CODE_WINFIXUP	0x08	/* Miss happened during spill/fill */
> #define FAULT_CODE_BLKCOMMIT	0x10	/* Use blk-commit ASI in copy_page */
> 
> 0xfe is an illegal value.
> 
> I suspect that once you hit this IDE bug, the IDE controller is
> spamming garbage via DMA all over memory corrupting things.

There is another thing that contradicts this. This BUG() really happened 
twice for the same "vmstat" program when I ran it consecutively. On the 
same faulting address.

After stopping simultaneous I/O and clearing the cache with "echo 3 
>/proc/sys/vm/drop_caches", the machine ran reliably, including that 
vmstat command (I rebooted it anyway fearing hidden data corruption, but 
there were really no more program failures).

- So, if the controller corrupted kernel code, the machine wouldn't 
recover.
- If the controller corrupted common kernel data, the bug would show on 
all processes or on all "vmstat" processes and it wouldn't go away after 
clearing disk cache.
- If the controller corrupted per-process kernel data, the probability 
that it corrupted two processes in the same way is small.
- Other idea?

Sadly I don't have copy of the corrupted binary, I wasn't at the console 
and I found out about the BUG later :-/

Mikulas
--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html