Hi,
On 4/11/19 5:10 AM, Michael Schmitz wrote:
On 10/04/19 10:07 AM, Michael Schmitz wrote:
The situation we encounter here (kthread->data == NULL) seems to have
been anticipated by the designers of this 'speculative' read of
kthread data. We still take a bus error, even though
__probe_kernel_read() attempts to suppress that. Unfortunately, our
bus_error030() is agnostic to that (at least in the corner case of
either invalid MMU descriptor or write protect faults).
OK, I decided to bite the bullet and modify bus_error030() to allow
falling through to do_page_fault if an invalid page read happens while
page faults are disabled.
...> You may want to give this a spin, to see whether it fixes your syscall
> errors.
I tried the bus_error030() patch.
I still get sometimes:
---------------------------------------------------
Data read fault at 0x801740c4 in Super Data (pc=0x2918)
BAD KERNEL BUSERR
Oops: 00000000
PC: [<00002918>] auto_inthandler+0x0/0x28
SR: 2400 SP: (ptrval) a2: 8017a478
d0: 00000018 d1: 0000001a d2: 00000000 d3: 8017a480
d4: 00000018 d5: 00000054 a0: 8017a480 a1: 8016f84c
Process exe (pid: 26, task=(ptrval))
Frame format=B ssw=0345 isc=2f00 isb=48e7 daddr=801740c4 dobuf=80176e58
baddr=801740c4 dibuf=801740c4 ver=0
Stack from 009b9ff8:
02088001 ebf40070
Call Trace:
Code: 0005 61ff 0002 c926 508f 588f 60a2 0000 <42a7> 4878 ffff 2f00 48e7
7ce0 200f 0280 ffff e000 2440 2452 e9ef 010a 0032 0440
...
> d auto_inthandler
auto_inthandler:
$00002918 : 42a7 clr.l -(sp)
$0000291a : 4878 ffff pea $ffffffff.w
$0000291e : 2f00 move.l d0,-(sp)
$00002920 : 48e7 7ce0 movem.l d1-d5/a0-a2,-(sp)
---------------------------------------------------
>2GB address in stack, which the autovector handler tries to clear?
And sometimes this:
---------------------------------------------------
Run /init as init process
Unable to handle kernel access at virtual address (ptrval)
Oops: 00000000
PC: [<00002954>] user_inthandler+0x8/0x20
SR: 2d04 SP: (ptrval) a2: 00000000
d0: 00000000 d1: ffffffff d2: 00000000 d3: 00000000
d4: 00000001 d5: 00000001 a0: efd4ef3c a1: 80057d14
Process exe (pid: 25, task=(ptrval))
Frame format=A ssw=0301 isc=200f isb=0280 daddr=00ff2f00 dobuf=00000000
Stack from 009b9ff8:
02048005 7da40114
Call Trace:
Code: 508f 60ff ffff ff64 42a7 4878 ffff 2f00 <48e7> 7ce0 200f 0280 ffff
e000 2440 2452 e9ef 010a 0032 0440 0038 2f0f 2f00 4eb9
Disabling lock debugging due to kernel taint
Segmentation fault
...
> d user_inthandler
user_inthandler:
$0000294c : 42a7 clr.l -(sp)
$0000294e : 4878 ffff pea $ffffffff.w
$00002952 : 2f00 move.l d0,-(sp)
$00002954 : 48e7 7ce0 movem.l d1-d5/a0-a2,-(sp)
---------------------------------------------------
Here problem is the second value in stack.
Maybe missing auto-increment when auto-increment instruction page faults
causes later issues, when bus errors with them are ignored.
=> I'll discuss this on Hatari list
I got this with busybox:
---------------------------------------------------
# setsid cttyhack sh
Data write fault at 0x000003ac in Super Data (pc=0x514e)
** 5915 printk messages dropped ** <== WOW
[<00047fd8>] printk+0x0/0x16
[<00004f08>] die_if_kernel+0x3e/0x5e
[<00005c90>] send_fault_sig+0x7c/0x8e
[<00005e8c>] do_page_fault+0x1ea/0x1fe
[<00020000>] _I_CALL_TOP+0xd6c/0x1900
[<00040040>] sched_slice.isra.10+0x60/0x82
[<00040400>] check_preempt_wakeup+0xae/0xca
[<00047fd8>] printk+0x0/0x16
[<00047fea>] printk+0x12/0x16
[<0000523c>] buserr_c+0x106/0x5fa
[<00020000>] _I_CALL_TOP+0xd6c/0x1900
[<00040040>] sched_slice.isra.10+0x60/0x82
[<00040400>] check_preempt_wakeup+0xae/0xca
[<00047fd8>] printk+0x0/0x16
[<00047fd8>] printk+0x0/0x16
[<00002778>] buserr+0x20/0x28
[<00020000>] _I_CALL_TOP+0xd6c/0x1900
<never ending repeat>
---------------------------------------------------
Above was after doing "echo t > sysrq-trigger".
After rebooting and enabling profiling, I got on each
call to "setsid cttyhack sh":
---------------------------------------------------
*** ILLEGAL INSTRUCTION *** FORMAT=0
Current process id is 37
BAD KERNEL TRAP: 00000000
PC: [<0012fcc0>] strncpy_from_user+0x5c/0xe4
SR: 2200 SP: 2b92edab a2: 009c4a00
d0: 00000000 d1: 2f737973 d2: 00000ff0 d3: 00000ff0
d4: 00000000 d5: 00000000 a0: 00000ff0 a1: 80155c8c
Process cttyhack (pid: 37, task=8b255b7a)
Frame format=0
Stack from 009d1f1c:
80155c04 80155c8c 00000000 8017b101 ffffff49 00851000 009d1f60
0009aa72
00851010 80155c8c 00000ff0 80155c04 00020000 00000000 eff62d2d
80170f52
801712bc 009d1f74 0009ab76 80155c8c 00000000 00000000 009d1fac
000910b4
80155c8c 80155c8c 00020000 00000000 8017b101 eff62d2d 80170f52
eff60002
00000000 00000004 00000100 00000001 009d1fc4 000911ea ffffff9c
80155c8c
00020000 00000000 eff62d48 00002874 ffffff9c 80155c8c 00020000
00000000
Call Trace: [<0009aa72>] getname_flags+0x42/0x134
[<00020000>] _I_CALL_TOP+0xd6c/0x1900
[<0009ab76>] getname+0x12/0x18
[<000910b4>] do_sys_open+0xc2/0x1b0
[<00020000>] _I_CALL_TOP+0xd6c/0x1900
[<000911ea>] sys_openat+0x22/0x26
[<00020000>] _I_CALL_TOP+0xd6c/0x1900
[<00002874>] syscall+0x8/0xc
[<00020000>] _I_CALL_TOP+0xd6c/0x1900
Code: 9480 7203 b282 645a 2805 0eb1 1000 0800 <4a84> 664e 2781 0800 2801
0084 7f7f 7f7f 2c04 4686 2401 0682 fefe feff c486 6748
WARNING: unexpected (24 > 19) number of CPU data cache misses at 0x2a1e:
$00002a18 : f229 DC.W $f229
$00002a1e : f229 9c00 041c fmovem $41c(a1),fpiar/fpsr/fpcr
<repeated few times>
WARNING: 'invalid' CPU PC profile instruction address 0x800d24da!
<repeated many times with different addresses>
...
> d strncpy_from_user
strncpy_from_user:
$0012fc64 : 4e56 0000 link a6,#0
...
$0012fcb8 : 2805 move.l d5,d4
$0012fcba : 0eb1 1000 0800 moves.l (a1,d0.l),d1
$0012fcc0 : 4a84 tst.l d4
---------------------------------------------------
I assume the actual error is on $12fcba moves.l instruction.
=> Failing address here is again >2GB.
Doing just "setsid" takes a long time to finish, and from profile
backtrace during this I noticed:
---------------------------------------------------
...
- 0x24af90: __schedule (return = 0x24b2c2)
- 0x24b2a8: preempt_schedule_common (return = 0x24b436)
- 0x24b40e: _cond_resched (return = 0x7ef9a)
- 0x7eb8c: unmap_page_range (return = 0x7f03a)
- 0x7eff4: unmap_single_vma (return = 0x7f0f8)
- 0x7f0c2: unmap_vmas (return = 0x83da2)
- 0x83ce4: exit_mmap (return = 0x2558c)
- 0x25570: mmput (return = 0x28a6a)
- 0x2884e: do_exit (return = 0x4f20)
- 0x4eca: die_if_kernel (return = 0x4f98)
- 0x4f28: bad_super_trap (return = 0x50aa)
- 0x5072: trap_c (return = 0x27a0)
- 0x12fc64: strncpy_from_user (return = 0x9aa72)
- 0x9aa30: getname_flags (return = 0x9ab76)
- 0x9ab64: getname (return = 0x910b4)
- 0x90ff2: do_sys_open (return = 0x911ea)
- 0x911c8: sys_openat (return = 0x2874)
- 0x24af90: __schedule (return = 0x24b324)
- 0x24b2da: schedule (return = 0x24cfde)
...
> d getname_flags
getname_flags:
$0009aa30 : 4e56 0000 link a6,#0
$0009aa34 : 48e7 381c movem.l d2-d4/a3-a5,-(sp)
$0009aa38 : 262e 0008 move.l 8(a6),d3
$0009aa3c : 282e 0010 move.l $10(a6),d4
$0009aa40 : 2f3c 0060 00c0 move.l #$6000c0,-(sp)
$0009aa46 : 2f39 0030 b660 move.l $30b660,-(sp)
$0009aa4c : 49f9 0008 c9e8 lea $8c9e8,a4
$0009aa52 : 4e94 jsr (a4)
$0009aa54 : 2648 movea.l a0,a3
$0009aa56 : 508f addq.l #8,sp
$0009aa58 : 4a88 tst.l a0
$0009aa5a : 676e beq.s $9aaca
$0009aa5c : 41e8 0010 lea $10(a0),a0
$0009aa60 : 2688 move.l a0,(a3)
$0009aa62 : 4878 0ff0 pea $0ff0.w
$0009aa66 : 2f03 move.l d3,-(sp)
$0009aa68 : 2f08 move.l a0,-(sp)
$0009aa6a : 4bf9 0012 fc64 lea $12fc64,a5
$0009aa70 : 4e95 jsr (a5)
$0009aa72 : 2400 move.l d0,d2
$0009aa74 : 4fef 000c lea $c(sp),sp
---------------------------------------------------
sys_openat() -> do_sys_open() -> getname() -> getname_flags() ->
strncpy_from_user() is again failing.
do_sys_open() & getname() don't do anything interesting for
the filename.
getname_flags() does following:
---------------------------------------------------
#define EMBEDDED_NAME_MAX (PATH_MAX - offsetof(struct filename,
iname))
struct filename *
getname_flags(const char __user *filename, int flags, int *empty)
{
struct filename *result;
char *kname;
int len;
BUILD_BUG_ON(offsetof(struct filename, iname) % sizeof(long) != 0);
result = audit_reusename(filename);
if (result)
return result;
result = __getname();
if (unlikely(!result))
return ERR_PTR(-ENOMEM);
/*
* First, try to embed the struct filename inside the names_cache
* allocation
*/
kname = (char *)result->iname;
result->name = kname;
len = strncpy_from_user(kname, filename, EMBEDDED_NAME_MAX);
---------------------------------------------------
So, either __getname()->iname, or filename coming from sys_openat()
is wrong, or something goes wrong in strncpy_from_user() itself.
__getname is:
include/linux/fs.h:#define __getname()
kmem_cache_alloc(names_cachep, GFP_KERNEL)
and "names_cachep" is initialized in:
---------------------------------------------------
/* SLAB cache for __getname() consumers */
struct kmem_cache *names_cachep __read_mostly;
EXPORT_SYMBOL(names_cachep);
...
void __init vfs_caches_init(void)
{
names_cachep = kmem_cache_create_usercopy("names_cache",
PATH_MAX, 0,
SLAB_HWCACHE_ALIGN|SLAB_PANIC, 0, PATH_MAX, NULL);
...
}
---------------------------------------------------
m68k defaults to SLUB allocator, so the code is in slub.c
(instead of slab.c used on more common architectures).
Quick look at that didn't show anything that could return negative
values, it only returns kmem_cache_cpu->freelist values, and if
those are wrong, I would expect more issues.
=> I will test the other allocators. Should they work?
I'll also try strace. Does ptrace() work OK on m68k?
- Eero