Re: Oopses and invalid addresses under Hatari

Eero Tamminen <oak@xxxxxxxxxxxxxx> · Sat, 13 Apr 2019 01:53:24 +0300

Hi,

On 4/11/19 5:10 AM, Michael Schmitz wrote:
On 10/04/19 10:07 AM, Michael Schmitz wrote:
The situation we encounter here (kthread->data == NULL) seems to have 
been anticipated by the designers of this 'speculative' read of 
kthread data. We still take a bus error, even though 
__probe_kernel_read() attempts to suppress that. Unfortunately, our 
bus_error030() is agnostic to that (at least in the corner case of 
either invalid MMU descriptor or write protect faults).

OK, I decided to bite the bullet and modify bus_error030() to allow 
falling through to do_page_fault if an invalid page read happens while 
page faults are disabled.
...> You may want to give this a spin, to see whether it fixes your syscall
> errors.

I tried the bus_error030() patch.

I still get sometimes:
---------------------------------------------------
Data read fault at 0x801740c4 in Super Data (pc=0x2918)
BAD KERNEL BUSERR
Oops: 00000000
PC: [<00002918>] auto_inthandler+0x0/0x28
SR: 2400  SP: (ptrval)  a2: 8017a478
d0: 00000018    d1: 0000001a    d2: 00000000    d3: 8017a480
d4: 00000018    d5: 00000054    a0: 8017a480    a1: 8016f84c
Process exe (pid: 26, task=(ptrval))
Frame format=B ssw=0345 isc=2f00 isb=48e7 daddr=801740c4 dobuf=80176e58
baddr=801740c4 dibuf=801740c4 ver=0
Stack from 009b9ff8:
        02088001 ebf40070
Call Trace:
Code: 0005 61ff 0002 c926 508f 588f 60a2 0000 <42a7> 4878 ffff 2f00 48e7 
7ce0 200f 0280 ffff e000 2440 2452 e9ef 010a 0032 0440
...
> d auto_inthandler
auto_inthandler:
$00002918 : 42a7            clr.l     -(sp)
$0000291a : 4878 ffff       pea       $ffffffff.w
$0000291e : 2f00            move.l    d0,-(sp)
$00002920 : 48e7 7ce0       movem.l   d1-d5/a0-a2,-(sp)
---------------------------------------------------

>2GB address in stack, which the autovector handler tries to clear?

And sometimes this:
---------------------------------------------------
Run /init as init process
Unable to handle kernel access at virtual address (ptrval)
Oops: 00000000
PC: [<00002954>] user_inthandler+0x8/0x20
SR: 2d04  SP: (ptrval)  a2: 00000000
d0: 00000000    d1: ffffffff    d2: 00000000    d3: 00000000
d4: 00000001    d5: 00000001    a0: efd4ef3c    a1: 80057d14
Process exe (pid: 25, task=(ptrval))
Frame format=A ssw=0301 isc=200f isb=0280 daddr=00ff2f00 dobuf=00000000
Stack from 009b9ff8:
        02048005 7da40114
Call Trace:
Code: 508f 60ff ffff ff64 42a7 4878 ffff 2f00 <48e7> 7ce0 200f 0280 ffff 
e000 2440 2452 e9ef 010a 0032 0440 0038 2f0f 2f00 4eb9
Disabling lock debugging due to kernel taint
Segmentation fault
...
> d user_inthandler
user_inthandler:
$0000294c : 42a7                               clr.l     -(sp)
$0000294e : 4878 ffff                          pea       $ffffffff.w
$00002952 : 2f00                               move.l    d0,-(sp)
$00002954 : 48e7 7ce0                          movem.l   d1-d5/a0-a2,-(sp)
---------------------------------------------------

Here problem is the second value in stack.

Maybe missing auto-increment when auto-increment instruction page faults
causes later issues, when bus errors with them are ignored.
=> I'll discuss this on Hatari list

I got this with busybox:
---------------------------------------------------
# setsid cttyhack sh
Data write fault at 0x000003ac in Super Data (pc=0x514e)
** 5915 printk messages dropped **      <== WOW
 [<00047fd8>] printk+0x0/0x16
 [<00004f08>] die_if_kernel+0x3e/0x5e
 [<00005c90>] send_fault_sig+0x7c/0x8e
 [<00005e8c>] do_page_fault+0x1ea/0x1fe
 [<00020000>] _I_CALL_TOP+0xd6c/0x1900
 [<00040040>] sched_slice.isra.10+0x60/0x82
 [<00040400>] check_preempt_wakeup+0xae/0xca
 [<00047fd8>] printk+0x0/0x16
 [<00047fea>] printk+0x12/0x16
 [<0000523c>] buserr_c+0x106/0x5fa
 [<00020000>] _I_CALL_TOP+0xd6c/0x1900
 [<00040040>] sched_slice.isra.10+0x60/0x82
 [<00040400>] check_preempt_wakeup+0xae/0xca
 [<00047fd8>] printk+0x0/0x16
 [<00047fd8>] printk+0x0/0x16
 [<00002778>] buserr+0x20/0x28
 [<00020000>] _I_CALL_TOP+0xd6c/0x1900
<never ending repeat>
---------------------------------------------------
Above was after doing "echo t > sysrq-trigger".

After rebooting and enabling profiling, I got on each
call to "setsid cttyhack sh":
---------------------------------------------------
*** ILLEGAL INSTRUCTION ***   FORMAT=0
Current process id is 37
BAD KERNEL TRAP: 00000000
PC: [<0012fcc0>] strncpy_from_user+0x5c/0xe4
SR: 2200  SP: 2b92edab  a2: 009c4a00
d0: 00000000    d1: 2f737973    d2: 00000ff0    d3: 00000ff0
d4: 00000000    d5: 00000000    a0: 00000ff0    a1: 80155c8c
Process cttyhack (pid: 37, task=8b255b7a)
Frame format=0
Stack from 009d1f1c:
        80155c04 80155c8c 00000000 8017b101 ffffff49 00851000 009d1f60 
0009aa72
        00851010 80155c8c 00000ff0 80155c04 00020000 00000000 eff62d2d 
80170f52
        801712bc 009d1f74 0009ab76 80155c8c 00000000 00000000 009d1fac 
000910b4
        80155c8c 80155c8c 00020000 00000000 8017b101 eff62d2d 80170f52 
eff60002
        00000000 00000004 00000100 00000001 009d1fc4 000911ea ffffff9c 
80155c8c
        00020000 00000000 eff62d48 00002874 ffffff9c 80155c8c 00020000 
00000000
Call Trace: [<0009aa72>] getname_flags+0x42/0x134
 [<00020000>] _I_CALL_TOP+0xd6c/0x1900
 [<0009ab76>] getname+0x12/0x18
 [<000910b4>] do_sys_open+0xc2/0x1b0
 [<00020000>] _I_CALL_TOP+0xd6c/0x1900
 [<000911ea>] sys_openat+0x22/0x26
 [<00020000>] _I_CALL_TOP+0xd6c/0x1900
 [<00002874>] syscall+0x8/0xc
 [<00020000>] _I_CALL_TOP+0xd6c/0x1900
Code: 9480 7203 b282 645a 2805 0eb1 1000 0800 <4a84> 664e 2781 0800 2801 
0084 7f7f 7f7f 2c04 4686 2401 0682 fefe feff c486 6748
WARNING: unexpected (24 > 19) number of CPU data cache misses at 0x2a1e:
$00002a18 : f229            DC.W      $f229
$00002a1e : f229 9c00 041c  fmovem    $41c(a1),fpiar/fpsr/fpcr
<repeated few times>
WARNING: 'invalid' CPU PC profile instruction address 0x800d24da!
<repeated many times with different addresses>
...
> d strncpy_from_user
strncpy_from_user:
$0012fc64 : 4e56 0000                          link      a6,#0
...
$0012fcb8 : 2805                               move.l    d5,d4
$0012fcba : 0eb1 1000 0800                     moves.l   (a1,d0.l),d1
$0012fcc0 : 4a84                               tst.l     d4
---------------------------------------------------

I assume the actual error is on $12fcba moves.l instruction.

=> Failing address here is again >2GB.

Doing just "setsid" takes a long time to finish, and from profile
backtrace during this I noticed:
---------------------------------------------------
...
- 0x24af90: __schedule (return = 0x24b2c2)
- 0x24b2a8: preempt_schedule_common (return = 0x24b436)
- 0x24b40e: _cond_resched (return = 0x7ef9a)
- 0x7eb8c: unmap_page_range (return = 0x7f03a)
- 0x7eff4: unmap_single_vma (return = 0x7f0f8)
- 0x7f0c2: unmap_vmas (return = 0x83da2)
- 0x83ce4: exit_mmap (return = 0x2558c)
- 0x25570: mmput (return = 0x28a6a)
- 0x2884e: do_exit (return = 0x4f20)
- 0x4eca: die_if_kernel (return = 0x4f98)
- 0x4f28: bad_super_trap (return = 0x50aa)
- 0x5072: trap_c (return = 0x27a0)
- 0x12fc64: strncpy_from_user (return = 0x9aa72)
- 0x9aa30: getname_flags (return = 0x9ab76)
- 0x9ab64: getname (return = 0x910b4)
- 0x90ff2: do_sys_open (return = 0x911ea)
- 0x911c8: sys_openat (return = 0x2874)
- 0x24af90: __schedule (return = 0x24b324)
- 0x24b2da: schedule (return = 0x24cfde)
...
> d getname_flags
getname_flags:
$0009aa30 : 4e56 0000                          link      a6,#0
$0009aa34 : 48e7 381c                          movem.l   d2-d4/a3-a5,-(sp)
$0009aa38 : 262e 0008                          move.l    8(a6),d3
$0009aa3c : 282e 0010                          move.l    $10(a6),d4
$0009aa40 : 2f3c 0060 00c0                     move.l    #$6000c0,-(sp)
$0009aa46 : 2f39 0030 b660                     move.l    $30b660,-(sp)
$0009aa4c : 49f9 0008 c9e8                     lea       $8c9e8,a4
$0009aa52 : 4e94                               jsr       (a4)
$0009aa54 : 2648                               movea.l   a0,a3
$0009aa56 : 508f                               addq.l    #8,sp
$0009aa58 : 4a88                               tst.l     a0
$0009aa5a : 676e                               beq.s     $9aaca
$0009aa5c : 41e8 0010                          lea       $10(a0),a0
$0009aa60 : 2688                               move.l    a0,(a3)
$0009aa62 : 4878 0ff0                          pea       $0ff0.w
$0009aa66 : 2f03                               move.l    d3,-(sp)
$0009aa68 : 2f08                               move.l    a0,-(sp)
$0009aa6a : 4bf9 0012 fc64                     lea       $12fc64,a5
$0009aa70 : 4e95                               jsr       (a5)
$0009aa72 : 2400                               move.l    d0,d2
$0009aa74 : 4fef 000c                          lea       $c(sp),sp
---------------------------------------------------

sys_openat() -> do_sys_open() -> getname() -> getname_flags() -> 
strncpy_from_user() is again failing.

do_sys_open() & getname() don't do anything interesting for
the filename.

getname_flags() does following:
---------------------------------------------------
#define EMBEDDED_NAME_MAX       (PATH_MAX - offsetof(struct filename, 
iname))

struct filename *
getname_flags(const char __user *filename, int flags, int *empty)
{
        struct filename *result;
        char *kname;
        int len;
        BUILD_BUG_ON(offsetof(struct filename, iname) % sizeof(long) != 0);

        result = audit_reusename(filename);
        if (result)
                return result;

        result = __getname();
        if (unlikely(!result))
                return ERR_PTR(-ENOMEM);

        /*
         * First, try to embed the struct filename inside the names_cache
         * allocation
         */
        kname = (char *)result->iname;
        result->name = kname;

        len = strncpy_from_user(kname, filename, EMBEDDED_NAME_MAX);
---------------------------------------------------

So, either __getname()->iname, or filename coming from sys_openat()
is wrong, or something goes wrong in strncpy_from_user() itself.

__getname is:
include/linux/fs.h:#define __getname() 
kmem_cache_alloc(names_cachep, GFP_KERNEL)

and "names_cachep" is initialized in:
---------------------------------------------------
/* SLAB cache for __getname() consumers */
struct kmem_cache *names_cachep __read_mostly;
EXPORT_SYMBOL(names_cachep);
...
void __init vfs_caches_init(void)
{
        names_cachep = kmem_cache_create_usercopy("names_cache", 
PATH_MAX, 0,
                        SLAB_HWCACHE_ALIGN|SLAB_PANIC, 0, PATH_MAX, NULL);
...
}
---------------------------------------------------

m68k defaults to SLUB allocator, so the code is in slub.c
(instead of slab.c used on more common architectures).

Quick look at that didn't show anything that could return negative
values, it only returns kmem_cache_cpu->freelist values, and if
those are wrong, I would expect more issues.

=> I will test the other allocators.  Should they work?

I'll also try strace.  Does ptrace() work OK on m68k?

	- Eero