Subject: tmpfs/ramfs with no swap oopses on Au1550/mips(el)

"elmar gerdes" <elmar.gerdes@xxxxxxx> · Wed, 23 May 2007 17:09:29 +0200

Hello,

I am experiencing oopses when using tmpfs heavily on an embedded

	Au1550-based board (MIPS 32-bit core, little-endian, 128MB RAM).

Everything I need seemed to work (PCI, USB, JFFS2, busybox, tools like
mtools, WLAN, network, ...).  The system currently uses busybox-1.00 and
glibc-2.3.2.  Everything has been compiled using mipsel-linux-gcc 3.3.3
created from Dan Kegel's crosstool-0.38.

But when I use tmpfs heavily using mcopy or appending to a file using

	cat filename >> /tmp/xxx

I either get "No space left on device" (kernel 2.6.21-rc7) which is what
I expect, or I get a sequence of oopses (2.6.14, 2.6.18.2, 2.6.19.2).
When getting "No space left on device", inserting a USB-stick might then
cause an oops ("Bad page state in process 'hotplug'").

I have seen various types of oopses, but they all(?)/usually start with
one like these:

  CPU 0 Unable to handle kernel paging request at virtual address\
      5da7bee4, epc == 801795c0, ra == 802b61c8

  CPU 0 Unable to handle kernel paging request at virtual address\
      00000000, epc == 8015b728, ra == 8015ba20

  CPU 0 Unable to handle kernel paging request at virtual address\
      000000c4, epc == 80160b58, ra == 801604c4

  CPU 0 Unable to handle kernel paging request at virtual address\
      000000c4, epc == 80160b58, ra == 801604c4

  CPU 0 Unable to handle kernel paging request at virtual address\
      000000d4, epc == 8015e724, ra == 8015dee8

  CPU 0 Unable to handle kernel paging request at virtual address\
      00000104, epc == 801748bc, ra == 80174844

  CPU 0 Unable to handle kernel paging request at virtual address\
      00000000, epc == 80159858, ra == 80159b70

  CPU 0 Unable to handle kernel paging request at virtual address\
      24ba4d88, epc == 801ace7c, ra == 801b1e84

  CPU 0 Unable to handle kernel paging request at virtual address\
      00000128, epc == 80155628, ra == 8015575c

I tested this using kernel 2.6.19.2, 2.6.18.2, 2.6.14, and 2.6.21-rc7
like this:

# uname -a
Linux mybox 2.6.18.2 #1 Tue May 22 16:58:12 CEST 2007 mips unknown

df(1) under 2.6.18.2 does not show /var, being 40M of tmpfs as 2.6.19.2
would tell us.  /tmp is a symlink to /var/tmp.

Filling tmpfs:

# cat /lib/libc.so.6 >> /tmp/abc
# cat /lib/libc.so.6 >> /tmp/abc
# cat /lib/libc.so.6 >> /tmp/abc
# cat /lib/libc.so.6 >> /tmp/abc
# cat /lib/libc.so.6 >> /tmp/abc
# cat /lib/libc.so.6 >> /tmp/abc

# ll -h /tmp/abc*
-rw-r--r--    1 root     root         9.1M Jan  1 00:02 /tmp/abc

# cp /tmp/abc /tmp/abc2
# cp /tmp/abc /tmp/abc3
# cp /tmp/abc /tmp/abc4

# ll -h /tmp/abc*
-rw-r--r--    1 root     root         9.1M Jan  1 00:02 /tmp/abc
-rw-r--r--    1 root     root         9.1M Jan  1 00:02 /tmp/abc2
-rw-r--r--    1 root     root         9.1M Jan  1 00:02 /tmp/abc3
-rw-r--r--    1 root     root         9.1M Jan  1 00:02 /tmp/abc4

# cp /tmp/abc /tmp/abc5

On 2.6.14, 2.6.18.2, and 2.6.19.2 I get this:

CPU 0 Unable to handle kernel paging request at virtual address 00000000,\
   epc == 80159858, ra == 80159b70
Oops[#1]:
Cpu 0
$ 0   : 00000000 1000fc00 8f818018 810a1458
$ 4   : 00100100 00000000 00200200 803c2054
$ 8   : 0000475d 00000001 00000001 fffffff8
$12   : 0055a000 00000000 00001000 03200008
$16   : 803c2028 1000fc01 803c2048 803c2028
$20   : 00000000 00000000 000280d2 00000000
$24   : 00000000 2ac6342c
$28   : 804c6000 804c7d90 803c2550 80159b70
Hi    : 307d2bf7
Lo    : a60e8300
epc   : 80159858 buffered_rmqueue+0x9c/0x1f4     Not tainted
ra    : 80159b70 get_page_from_freelist+0x10c/0x134
Status: 1000fc02    KERNEL EXL
Cause : 0080000c
BadVA : 00000000
PrId  : 03030200
Modules linked in: aes blowfish llc vfat fat
Process cp (pid: 853, threadinfo=804c6000, task=804d2500)
Stack : 804d2500 00000000 00000000 00000000 803c2550 00000000 00000000 00000000
        00000044 00000000 00000004 80159b70 100d0eb0 87b54e60 00000001 802327fc
        00000044 100cf050 803c2550 804d2500 000280d2 87c8abc8 00000000 00000000
        803c2550 00000558 00000010 80159c00 000280d2 80170d28 801536ac 801549a0
        00001000 386d4434 80704880 00000000 00000000 87c8abc8 87c8ac28 00000000
        ...
Call Trace:
 [<80159b70>] get_page_from_freelist+0x10c/0x134
 [<802327fc>] radix_tree_insert+0x190/0x1e0
 [<80159c00>] __alloc_pages+0x68/0x2fc
 [<80170d28>] shmem_swp_alloc+0xb4/0x254
 [<801536ac>] unlock_page+0x68/0xd8
 [<801549a0>] file_read_actor+0x0/0x100
 [<80171d94>] shmem_getpage+0x234/0x65c
 [<80172824>] shmem_file_write+0x160/0x2b0
 [<80172d08>] shmem_file_read+0x70/0x80
 [<8017a1d4>] vfs_write+0xd4/0x1a0
 [<801340f0>] update_process_times+0x58/0x90
 [<801340d4>] update_process_times+0x3c/0x90
 [<8017a394>] sys_write+0x54/0xa0
 [<8010d380>] stack_done+0x20/0x3c

Code: 8c650004  34840100  34c60200 <aca20000> ac450004  ac640000\
   ac660004  8e020020  2472ffe8
Break instruction in kernel code[#2]:
Cpu 0
(((oops #2 follows)))

Code: 00000040  080579ca  8cc300d4 <0000800d> 080579bf  00000000\
   27bdffe8  afb00010  afbf0014
Fixing recursive fault but reboot is needed!
Unhandled kernel unaligned access[#3]:
Cpu 0
(((oops #3 follows)))

Code: 30630001  14600044  00000000 <8c820000> 000211c2  30420001\
   1040003c  00000000  8c90001c
Break instruction in kernel code[#4]:
Cpu 0
(((oops #4 follows)))

Code: 0000800d  0805d97b  00000000 <0000800d> 0805d979  8ca20018\
   0805d973  8ca5000c  27bdffe0
Kernel panic - not syncing: Attempted to kill init!
 Break instruction in kernel code[#5]:
Cpu 0
(((oops #5 follows)))

Code: 8fb00010  03e00008  27bd0020 <0000800d> 0805d9c7  8c840018\
   0805d9c1  8c84000c  03e00008
Kernel panic - not syncing: Aiee, killing interrupt handler!
(((no more messages)))

On 2.6.21-rc7 I get this:

Bad page state in process 'hotplug'
page:8100cd80 flags:0x00000000 mapping:00000000 mapcount:1 count:0
Trying to fix it up, but a reboot is needed
Backtrace:
Call Trace:
[<80109964>] dump_stack+0x8/0x34
[<8015aa1c>] bad_page+0x6c/0xb0
[<8015ba50>] free_hot_cold_page+0x1b8/0x1c8
[<8015ba70>] free_hot_page+0x10/0x1c
[<8015ff38>] __page_cache_release+0x138/0x2ec
[<8016020c>] put_page+0x68/0xd4
[<80157398>] filemap_nopage+0x368/0x578
[<8016b930>] do_no_page+0x90/0x470
[<8016c118>] __handle_mm_fault+0x168/0x3dc
[<8010e2c0>] do_page_fault+0x100/0x360
[<80103a80>] ret_from_exception+0x0/0x20

Bad page state in process 'hotplug'
page:8100cd80 flags:0x00080000 mapping:00000000 mapcount:0 count:0
Trying to fix it up, but a reboot is needed
(((same backtrace as before, then
CPU 0 Unable to handle kernel paging request at virtual address\
    00100104, epc == 8015adc4, ra == 8015ba34
(((further messages)))

Some thoughts on this:

 - I can reproduce the oops whether root is a JFFS2 partition or
   NFS-root, so I expect that no MTD/JFFS2-stuff is involved.

 - Before I used 40M of tmpfs, I had 64M (of the 128M of RAM).  Filling
   that tmpfs partition lead to the oops when about 40M were reached.
   Using 32M of tmpfs leads to "No space left on device" (as was seen
   for kernel 2.6.21-rc7).  But inserting a USB-stick can already
   provoke the oops after "No space left"...

 - I tried (a) ramfs and (b) a large ramdisk instead of tmpfs, as I
   expected that the missing swap (as backing store) for tmpfs to be the
   reason.  But both lead to oopses.  So I expect that the missing swap
   is not the reason.

 - Since swap seems to not be involved, I expect that commit
   6ebba0e2f56ee77270a9ef8e92c1b4ec38e5f419 ([MIPS] Fix swap entry for
   MIPS32 36-bit physical address) is not involved either.

 - I could not reproduce the oopses on a x86 system, so I think this
   could/should be mips-specific.

Does someone have an idea on how to fix this?

Thank you

	Elmar