Re: Linux port failing on MIPS32 24Kc

joe seb <joe.seb8@xxxxxxxxx> · Mon, 13 Jul 2009 20:17:48 +0530

Ralf,
 
We made changes to our MIPS FPGA to map the RAM to KSEG0 (0x80000000) start address. We still see the issue with the write-back. Write-through is working fine. 
 
The failure log we got this time is given below:
 
Linux version 2.6.29.4 () (gcc version 4.3.2 (Sourcery G
++ Lite 4.3-51) ) #15 PREEMPT Mon Jul 13 13:22:21 IST 2009
CPU revision is: 0101937c (MIPS 24Kc)
Determined physical RAM map:
User-defined physical RAM map:

 memory: 10000000 @ 00000000 (usable)
Initrd not found or empty - disabling initrd
Zone PFN ranges:
  Normal   0x00000000 -> 0x00010000
Movable zone start PFN for each node
early_node_map[1] active PFN ranges

    0: 0x00000000 -> 0x00010000
Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 65024
Kernel command line: mem=256M console=ttyS0,9600 cca=3
Primary instruction cache 32kB, VIPT, 4-way, linesize 32 bytes.

Primary data cache 32kB, 4-way, PIPT, no aliases, linesize 32 bytes
Using cache attribute 3
Writing ErrCtl register=00000000
Readback ErrCtl register=00000000
PID hash table entries: 1024 (order: 10, 4096 bytes)

CPU frequency 50.00 MHz
console [ttyS0] enabled
Dentry cache hash table entries: 32768 (order: 5, 131072 bytes)
Inode-cache hash table entries: 16384 (order: 4, 65536 bytes)
Memory: 257092k/262144k available (1211k kernel code, 4728k reserved, 234k data,

 692k init, 0k highmem)
Calibrating delay loop... 33.17 BogoMIPS (lpj=165888)
Mount-cache hash table entries: 512
VFS: Disk quotas dquot_6.5.2
Dquot-cache hash table entries: 1024 (order 0, 4096 bytes)
msgmni has been set to 502

Serial: 8250/16550 driver, 2 ports, IRQ sharing enabled
serial8250: ttyS0 at MMIO 0xb1a30000 (irq = 188) is a 16550A
serial8250: ttyS1 at MMIO 0xb1a40000 (irq = 189) is a 16550A
Freeing unused kernel memory: 692k freed

Algorithmics/MIPS FPU Emulator v1.5
CPU 0 Unable to handle kernel paging request at virtual address ffffb32c, epc ==
 8f15b32c, ra == 8f15b32c
CPU 0 Unable to handle kernel paging request at virtual address 0000001c, epc ==

 800210e4, ra == 8002120c
Oops[#1]:
Cpu 0
$ 0   : 00000000 7fccc8e8 fffffff8 00000000
$ 4   : 8021dbf0 00000000 8f81fc60 80220000
$ 8   : 00000000 9b915d6d 8021dbc0 fffffffc
$12   : 0000000f 00000002 80136f7c 00000010

$16   : 00000000 00000000 8021dbc0 8f821eb0
$20   : 00000000 00000000 00000003 ffffffff
$24   : 00000000 0043e4b0                  
$28   : 8f820000 8f821e00 8021dbc0 8002120c
Hi    : 0098963a
Lo    : ebe5df80

epc   : 800210e4 set_next_entity+0x14/0x8c
    Not tainted
ra    : 8002120c pick_next_task_fair+0x90/0xe4
Status: 10004402    KERNEL EXL 
Cause : 00800008
BadVA : 0000001c
PrId  : 0101937c (MIPS 24Kc)

Process init (pid: 1, threadinfo=8f820000, task=8f81fb08, tls=0053f470)
Stack : 00000000 800221a8 00000000 80021550 00000000 00000000 8021dbc0 8002120c
        8f81fb08 80020a90 8f81fb00 8f81fb08 80130000 8f81fb08 8f81fb08 80004924

        7fcccc04 800051a8 8f81fc60 8f168de0 8f81fc60 8f821eb4 8f81fc60 00000001
        8f81fb00 8f81fb08 8f81fbfc 8f821eb0 00000000 00000000 00000003 ffffffff
        00000006 8002c95c 8021dbc0 8f15b238 00000001 8f81fb08 00000003 00000000

        ...
Call Trace:
[<800210e4>] set_next_entity+0x14/0x8c
[<8002120c>] pick_next_task_fair+0x90/0xe4
[<80004924>] schedule+0x5e4/0x72c
[<8002c95c>] do_wait+0x31c/0x4a0
[<8002cbe0>] sys_wait4+0x100/0x174

[<80002398>] stack_done+0x20/0x3c

Code: afb10014  afbf001c  afb00010 <8ca2001c> 00a08821  10400008  00809021  8c82
0024  24b00008 
note: init[1] exited with preempt_count 2
BUG: scheduling while atomic: init/1/0x10000003
Call Trace:

[<80003fd0>] dump_stack+0x8/0x38
[<800048a0>] schedule+0x560/0x72c
[<80021774>] __cond_resched+0x18/0x38
[<80004c20>] _cond_resched+0x50/0x58
[<80085874>] unmap_vmas+0x614/0x6d4

[<8008b260>] exit_mmap+0xe8/0x1f8
[<80025dd8>] mmput+0x9c/0x194
[<8002aa40>] exit_mm+0x15c/0x268
[<8002ce28>] do_exit+0xf4/0x88c
[<8000e110>] nmi_exception_handler+0x0/0x34
CPU 0 Unable to handle kernel paging request at virtual address 0000001c, epc ==
 800210e4, ra == 8002120c
Oops[#2]:
Cpu 0
$ 0   : 00000000 00000001 fffffff8 00000000
$ 4   : 8021dbf0 00000000 8f81eef8 80220000

$ 8   : 00000000 9b915d6d 8021dbc0 ffff8db3
$12   : 00001b5c 80138ef0 80136f7c 00000010
$16   : 00000000 00000000 8021dbc0 00000000
$20   : 00000000 00000000 00000000 00000000
$24   : 00000019 80121a84                  

$28   : 8fb08000 8fb09ee8 8021dbc0 8002120c
Hi    : 0098963b
Lo    : 67e02780
epc   : 800210e4 set_next_entity+0x14/0x8c
    Tainted: G      D   
ra    : 8002120c pick_next_task_fair+0x90/0xe4
Status: 10004402    KERNEL EXL 

Cause : 00800008
BadVA : 0000001c
PrId  : 0101937c (MIPS 24Kc)
Process events/0 (pid: 4, threadinfo=8fb08000, task=8f81eda0, tls=00000000)
Stack : 00000000 800221a8 80160000 802637b4 00000000 00000000 8021dbc0 8002120c

        8f81eda0 80020a90 8faf4a88 8faf4a80 80130000 8faf4a80 8f81eda0 80004924
        8faf4a80 800959b0 8fb09f80 8fb09f80 8f81eef8 8faf4a88 8f81eef8 800436a0
        8faf4a88 8faf4a80 8fb09f80 00000000 00000000 00000000 00000000 00000000

        00000000 8003e4f8 8f81eef8 8015e000 8f81eef8 80217540 00000000 8f81eda0
        ...
Call Trace:
[<800210e4>] set_next_entity+0x14/0x8c
[<8002120c>] pick_next_task_fair+0x90/0xe4
[<80004924>] schedule+0x5e4/0x72c

[<8003e4f8>] worker_thread+0xc4/0xcc
[<80042d14>] kthread+0x58/0xa4
[<800095ec>] kernel_thread_helper+0x10/0x18

Code: afb10014  afbf001c  afb00010 <8ca2001c> 00a08821  10400008  00809021  8c82
0024  24b00008 
note: events/0[4] exited with preempt_count 2

Any suggestions on debugging this?
 
Thanks and Regards,
Joe


On Wed, Jul 8, 2009 at 4:07 PM, Ralf Baechle <ralf@xxxxxxxxxxxxxx> wrote:




On Wed, Jul 08, 2009 at 01:37:42PM +0530, joe seb wrote:

> We are trying to port linux 2.6.29.4 version of the kernel from
> linux-mips.org site to our MIPS 24K based platform and we see issues when we

> use the cache in write-back mode. Cache with write-through configuration
> works fine.
> We use:
> Linux kernel - 2.6.29.4
> GNU cross tools - 4.3.2
> Busybox - 1.14.1
> U-boot - 2009.03

>
> Our platform has 256MB of RAM and its mapped to second 256 MB of the KSEG0
> (0x90000000 - 0x9FFFFFFF) and KSEG1 (0xB0000000 - 0xBFFFFFFF), and we
> specify that "mem=16M@256M" as boot parameter (we just want to use the first

> 16MB by the kernel). The cache initialization for the KSEG0 is done in
> u-boot.
>
> The error we get when cache is configured as write-back is given below:
>
> --------------------
> Linux version 2.6.29.4 (gcc version 4.3.2 (Sourcery G

> ++ Lite 4.3-51) ) #11 PREEMPT Tue Jul 7 21:16:00 IST 2009
> CPU revision is: 0101937c (MIPS 24Kc)
> Determined physical RAM map:
> User-defined physical RAM map:
>  memory: 01000000 @ 10000000 (usable)

> Wasting 2097152 bytes for tracking 65536 unused pages
> Initrd not found or empty - disabling initrd
> Zone PFN ranges:
>   Normal   0x00010000 -> 0x00011000
> Movable zone start PFN for each node

> early_node_map[1] active PFN ranges
>     0: 0x00010000 -> 0x00011000
> Built 1 zonelists in Zone order, mobility grouping off.  Total pages: 4064
> Kernel command line: mem=16M@256M console=ttyS0,9600 cca=3

> Primary instruction cache 32kB, VIPT, 4-way, linesize 32 bytes.
> Primary data cache 32kB, 4-way, PIPT, no aliases, linesize 32 bytes
> Using cache attribute 3
> Writing ErrCtl register=00000000

> Readback ErrCtl register=00000000
> PID hash table entries: 64 (order: 6, 256 bytes)
> CPU frequency 50.00 MHz
> console [ttyS0] enabled
> Dentry cache hash table entries: 2048 (order: 1, 8192 bytes)

> Inode-cache hash table entries: 1024 (order: 0, 4096 bytes)
> Memory: 13776k/16384k available (1210k kernel code, 2608k reserved, 234k
> data, 6
> 76k init, 0k highmem)
> Calibrating delay loop... 33.17 BogoMIPS (lpj=165888)

> Mount-cache hash table entries: 512
> VFS: Disk quotas dquot_6.5.2
> Dquot-cache hash table entries: 1024 (order 0, 4096 bytes)
> msgmni has been set to 26
> Serial: 8250/16550 driver, 2 ports, IRQ sharing enabled

> serial8250: ttyS0 at MMIO 0xa1a30000 (irq = 188) is a 16550A
> serial8250: ttyS1 at MMIO 0xa1a40000 (irq = 189) is a 16550A
> Freeing unused kernel memory: 676k freed
> Algorithmics/MIPS FPU Emulator v1.5

> CPU 0 Unable to handle kernel paging request at virtual address cccccccc,
> epc ==
>  cccccccc, ra == cccccccc
> Oops[#1]:
> Cpu 0
> $ 0   : 00000000 0053934a 00000000 00000001
> $ 4   : 00000001 00000000 00000000 0000009a

> $ 8   : 00000010 900038c0 00532730 00532730
> $12   : 00000018 90284880 00020000 00000034
> $16   : cccccccc 9016c320 9016aae0 90213520
> $20   : 00000000 9016a960 00000001 90fa9680
> $24   : 00000000 9009a414

> $28   : 90176000 90177d58 9016aae0 cccccccc
> Hi    : 0000000d
> Lo    : 0000004a
> epc   : cccccccc 0xcccccccc
>     Not tainted
> ra    : cccccccc 0xcccccccc
> Status: 10004403    KERNEL EXL IE

> Cause : 10800008
> BadVA : cccccccc
> PrId  : 0101937c (MIPS 24Kc)
> Process rcS (pid: 11, threadinfo=90176000, task=90200500, tls=0053f470)
> Stack : cccccccc cccccccc 90200500 9016aae0 90213520 00000000 90213520

> 00000000
>         9016aae0 90213520 00000000 90025c28 90200500 90faf9c0 90200500
> 00000107
>         9016aae0 00000107 9016aae0 900a04c8 00000003 902a4a40 902a4a48
> 00000000
>         902a4a44 900b3ad8 90fb1880 90fa9680 00000000 00000003 90fb1880

> 90fa9680
>         90177f30 90da7b20 cccccccc cccccccc cccccccc cccccccc cccccccc
> cccccccc
>         ...
> Call Trace:
> [<90025c28>] mmput+0x9c/0x194
> [<900a04c8>] flush_old_exec+0x47c/0x988

> [<900b3ad8>] alloc_fd+0x9c/0x1a4
> [<90086c88>] handle_mm_fault+0x9a8/0x107c
> [<9002f7c4>] do_softirq+0xc8/0xd0
> [<900cc60c>] load_elf_binary+0x0/0x1410
> [<9009fd9c>] search_binary_handler+0xa0/0x2bc

> [<900a138c>] do_execve+0x298/0x300
> [<900a4c60>] getname+0x28/0xc8
> [<9000c714>] sys_execve+0x4c/0x78
> [<90002398>] stack_done+0x20/0x3c
>
> Code: (Bad address in epc)

> do_cpu invoked from kernel context![#2]:
> Cpu 0
> $ 0   : 00000000 90210000 9016a98c 00000001
> $ 4   : 00000002 00000003 90168468 00000000
> $ 8   : 000007c4 00000004 9016846c 00000001
> $12   : ffffff80 00000000 90136f7c 00000010

> $16   : 00000000 00000000 90200500 90213520
> $20   : 9016a994 90177ca8 00000000 90fa9680
> $24   : 00000000 90121648
> $28   : 90176000 90177b48 9016aae0 90fa9680
> Hi    : 0098963b
> Lo    : 38c9b600

> epc   : 90fa9680 0x90fa9680
>     Tainted: G      D
> ra    : 90fa9680 0x90fa9680
> Status: 10004403    KERNEL EXL IE
> Cause : 1080002c
> PrId  : 0101937c (MIPS 24Kc)
> Process rcS (pid: 11, threadinfo=90176000, task=90200500, tls=0053f470)

> Stack : 9016aae0 900041c4 00000000 00000000 90177ca8 90177ca8 0000000b
> 90200500
>         90200500 cccccccc 00000000 9002cc78 90200658 9016a960 9020065c
> 90213520
>         90152d44 90177ca8 00000001 cccccccc 00000000 90177ca8 90152d44

> 90177ca8
>         90200500 cccccccc 00000000 90177ca8 00000000 90fa9680 9016aae0
> 9000e0d4
>         cccccccc 90220000 ffffffff 00000e89 90177bec cccccccc 9016a960
> 90010f58
>         ...

> Call Trace:
> [<900041c4>] printk+0x24/0x30
> [<9002cc78>] do_exit+0xf4/0x88c
> [<9000e0d4>] nmi_exception_handler+0x0/0x34
> [<90010f58>] do_page_fault+0x2e0/0x350

> [<90070234>] rmqueue_bulk+0x54/0xd8
> [<900b0d48>] touch_atime+0xf8/0x174
> [<9006c7e8>] generic_file_aio_read+0x4d8/0x8d8
> [<90000404>] ret_from_exception+0x0/0x10
> [<900038c0>] __bzero+0xc4/0x164

> [<9009a414>] do_sync_read+0x0/0x168
> [<90025c28>] mmput+0x9c/0x194
> [<900a04c8>] flush_old_exec+0x47c/0x988
> [<900b3ad8>] alloc_fd+0x9c/0x1a4
> [<90086c88>] handle_mm_fault+0x9a8/0x107c

> [<9002f7c4>] do_softirq+0xc8/0xd0
> [<900cc60c>] load_elf_binary+0x0/0x1410
> [<9009fd9c>] search_binary_handler+0xa0/0x2bc
> [<900a138c>] do_execve+0x298/0x300
> [<900a4c60>] getname+0x28/0xc8

> [<9000c714>] sys_execve+0x4c/0x78
> [<90002398>] stack_done+0x20/0x3c
>
> Code: 040a001a  e5a8b400  4018e618 <464c457f> 00010101  00000000  00000000
> 0008
> 0002  00000001

> Fixing recursive fault but reboot is needed!
> -------------------------------
>
> We get crashes at different places and the above crash is one of them.
> Do you think this failure is due to the wrong cache configuration or related

> to the d-cache aliasing problem?
>
> The cache details of our platform:
> D-cache: 32KB, 4-way, 32B line size, virtually indexed and physically
> tagged, Config7[AR] bit is set (alias is removed by the hardware).


Aliases disabled in hardware, yet you suspect aliases?  Doesn't make sense.


> I-cache:  32KB, 4-way, 32B line size,virtually indexed and physically tagged
>
>
> Is there any similar 24k platform supported in linux kernel which we are
> refer for the configurations?


There has been a kernel bug for a while which on platforms with a non-zero
memory start address would effectively disable part of of the cache code.
Your description above, including the changed behaviour between write-though

and write-back caches is consistent with that bug.  Commit
67227819d6dd07f6ec225ea59c67aff3ba936e25 fixes this issue.  For your
convenience I append it below.

I'd appreciate feedback on your test results with this patch.


(Why do people use non-zero starting addresses for memory?  Handling of
cache error exceptions is hard enough as it is but with no memory in the
low 32k the design idea of the cache architecture that stores relative to

$zero can be used goes down the drain and (not considering platform-specific
solutions here) only be handled by burning the scarce resource of a TLB
entry for an extremly rare event ...)

 Ralf

From 67227819d6dd07f6ec225ea59c67aff3ba936e25 Mon Sep 17 00:00:00 2001

From: Ralf Baechle <ralf@xxxxxxxxxxxxxx>
Date: Fri, 3 Jul 2009 07:11:15 +0100
Subject: [PATCH] MIPS: Fix pfn_valid()

For systems which do not define PHYS_OFFSET as 0 pfn_valid() may falsely

have returned 0 on most configurations.  Bug introduced by commit
752fbeb2e3555c0d236e992f1195fd7ce30e728d (linux-mips.org) rsp.
6f284a2ce7b8bc49cb8455b1763357897a899abb (kernel.org) titled "[MIPS]

FLATMEM: introduce PHYS_OFFSET."

Signed-off-by: Ralf Baechle <ralf@xxxxxxxxxxxxxx>

diff --git a/arch/mips/include/asm/page.h b/arch/mips/include/asm/page.h

index dc0eaa7..96a14a4 100644
--- a/arch/mips/include/asm/page.h
+++ b/arch/mips/include/asm/page.h
@@ -165,7 +165,14 @@ typedef struct { unsigned long pgprot; } pgprot_t;

 #ifdef CONFIG_FLATMEM

-#define pfn_valid(pfn)         ((pfn) >= ARCH_PFN_OFFSET && (pfn) < max_mapnr)

+#define pfn_valid(pfn)                                                 \
+({                                                                     \
+       unsigned long __pfn = (pfn);                                    \

+       /* avoid <linux/bootmem.h> include hell */                      \
+       extern unsigned long min_low_pfn;                               \
+                                                                       \

+       __pfn >= min_low_pfn && __pfn < max_mapnr;                      \
+})

 #elif defined(CONFIG_SPARSEMEM)