Hi Finn,
On 3/05/23 20:02, Finn Thain wrote:
On Wed, 3 May 2023, Michael Schmitz wrote:
I haven't yet tried to write code to demonstrate the theoretical
address error issue but I can attempt that if need be. However, such
code would be moot if this patch is going to be required anyway, just
to fix the bus error case...
No, seeing the coprocessor conditional branch case we want this patch
even if we decide to handle data faults differently.
That is, unless Andreas can come up with a reason why calculated branch
target adresses cannot be used with these coprocessor branch
instructions?
Moreover, can Andreas or Geert come up with a better way to fix the actual
bus error bug (nevermind the theoretical address error bug) than this same
patch (which just happens to work for both)?
In terms of fixing the bus error bug, I think we'll need to fix this
regression first (only first patch of yours applied):
[16472.250000] Unable to handle kernel access at virtual address 409fa84e
[16472.300000] Oops: 00000000
[16472.340000] Modules linked in: ne 8390p
[16472.400000] PC: [<00003b1a>] setup_frame+0x6e/0x1f8
[16472.440000] SR: 2204 SP: dadced50 a2: 006020d0
[16472.490000] d0: 00000000 d1: 00000000 d2: 00000000 d3: 00000000
[16472.520000] d4: 00c85f6c d5: 00003918 a0: c043dec8 a1: 006020d0
[16472.550000] Process stress-ng (pid: 3239, task=362f381a)
[16472.590000] Frame format=A ssw=0709 isc=0032 isb=0280
daddr=c043decc dobuf=0000000e
[16472.640000] Stack from 00c85da8:
[16472.640000] 00c85f6c 00000000 00000ca3 00000000 80088568
8008c4d4 00c85fcc 0000381c
[16472.640000] effff6fc 00c85fa4 00c85dcc 00c85e84 000c8536
00c85e90 00000001 00000001
[16472.640000] 000c84fe 00c85e84 00c85e6e 00c85e84 000bae1e
00c85e90 00000001 00000001
[16472.640000] 00c85f00 005ba83c 00c85e6e 00c85e6e 00680484
005ba7f8 00536b70 00982700
[16472.640000] 0001ecb4 0080ac20 00c85f80 0001f2b0 00c85ea8
0000000e 00536b70 000ce0bc
[16472.640000] 0080ac20 00536b70 0001ecb4 0000000e 00981a34
006023ce 00536b70 0001ecb4
[16472.900000] Call Trace: [<0000381c>] test_ti_thread_flag+0x0/0x24
[16472.940000] [<000c8536>] free_pages_and_swap_cache+0x38/0x40
[16472.970000] [<000c84fe>] free_pages_and_swap_cache+0x0/0x40
[16473.010000] [<000bae1e>] tlb_flush_mmu+0x80/0x96
[16473.060000] [<0001ecb4>] __sigqueue_free+0x34/0x3a
[16473.100000] [<0001f2b0>] next_signal+0x0/0x54
[16473.150000] [<000ce0bc>] kmem_cache_free+0x4a/0x56
[16473.190000] [<0001ecb4>] __sigqueue_free+0x34/0x3a
[16473.250000] [<0001ecb4>] __sigqueue_free+0x34/0x3a
[16473.300000] [<0001f0e4>] recalc_sigpending+0x6/0x1e
[16473.350000] [<0001f388>] dequeue_signal+0x84/0x130
[16473.390000] [<00021402>] do_signal_stop+0x0/0x154
[16473.430000] [<002db200>] mt_destroy_walk+0x14e/0x160
[16473.480000] [<00021a06>] get_signal+0x3d8/0x4f6
[16473.530000] [<00021b06>] get_signal+0x4d8/0x4f6
[16473.580000] [<0000381c>] test_ti_thread_flag+0x0/0x24
[16473.620000] [<00004536>] do_notify_resume+0x3b2/0x480
[16473.670000] [<00002000>] _start+0x0/0x8
[16473.720000] [<00002bfa>] do_IRQ+0x26/0x32
[16473.750000] [<00002af4>] auto_irqhandler_fixup+0x4/0xc
[16473.800000] [<00002204>] do_one_initcall+0xa8/0x188
[16473.850000] [<002db0b2>] mt_destroy_walk+0x0/0x160
[16473.900000] [<00002aa0>] do_signal_return+0x10/0x1a
[16473.940000] [<00002a26>] syscall+0x8/0xc
[16473.980000] [<00002000>] _start+0x0/0x8
[16474.010000]
[16474.070000] Code: 6002 4280 4281 2401 0eab 6800 0004 8480 <302c>
0032 0280 0000 0fff 2601 0eab 0800 0008 8483 761c d68b 0eab 3800 000c 8481
[16474.210000] Disabling lock debugging due to kernel taint
objdump -d of setup_frame:
00003aac <setup_frame>:
3aac: 4fef fee4 lea %sp@(-284),%sp
3ab0: 48e7 3f1e moveml %d2-%d7/%a3-%fp,%sp@-
3ab4: 282f 0148 movel %sp@(328),%d4
3ab8: 2c6f 0150 moveal %sp@(336),%fp
3abc: 284e moveal %fp,%a4
3abe: d9ee 0028 addal %fp@(40),%a4
3ac2: e9ec 0004 0032 bfextu %a4@(50),0,4,%d0
3ac8: 2a70 0db0 002f moveal @(00000000002fa214,%d0:l:4),%a5
3ace: a214
3ad0: 2044 moveal %d4,%a0
3ad2: 2c28 0034 movel %a0@(52),%d6
3ad6: 4a8d tstl %a5
3ad8: 6c06 bges 3ae0 <setup_frame+0x34>
3ada: 70f2 moveq #-14,%d0
3adc: 6000 01bc braw 3c9a <dbl_thresh+0x99>
3ae0: 486d 0138 pea %a5@(312)
3ae4: 2f04 movel %d4,%sp@-
3ae6: 4eba fe7c jsr %pc@(3964 <get_sigframe>)
3aea: 2648 moveal %a0,%a3
3aec: 508f addql #8,%sp
3aee: 2a3c 0000 3918 movel #14616,%d5 <=
&raw_copy_to_user
3af4: 4a8d tstl %a5
3af6: 6714 beqs 3b0c <setup_frame+0x60>
3af8: 2f0d movel %a5,%sp@-
3afa: 486e 0034 pea %fp@(52)
3afe: 4868 0138 pea %a0@(312)
3b02: 2245 moveal %d5,%a1
3b04: 4e91 jsr %a1@ <= raw_copy_to_user()
3b06: 4fef 000c lea %sp@(12),%sp
3b0a: 6002 bras 3b0e <setup_frame+0x62>
3b0c: 4280 clrl %d0
3b0e: 4281 clrl %d1
3b10: 2401 movel %d1,%d2
3b12: 0eab 6800 0004 movesl %d6,%a3@(4)
3b18: 8480 orl %d0,%d2
3b1a: 302c 0032 movew %a4@(50),%d0 <= fault PC
3b1e: 0280 0000 0fff andil #4095,%d0
3b24: 2601 movel %d1,%d3
3b26: 0eab 0800 0008 movesl %d0,%a3@(8)
3b2c: 8483 orl %d3,%d2
3b2e: 761c moveq #28,%d3
3b30: d68b addl %a3,%d3
3b32: 0eab 3800 000c movesl %d3,%a3@(12)
3b38: 8481 orl %d1,%d2
3b3a: 4878 0004 pea 4 <CC3_CLRE_I>
3b3e: 206f 0150 moveal %sp@(336),%a0
3b42: 4868 0004 pea %a0@(4)
Which is this line, as far as I can make out:
static int setup_frame(struct ksignal *ksig, sigset_t *set,
struct pt_regs *regs)
{
struct sigframe __user *frame;
struct pt_regs *tregs = rte_regs(regs);
int fsize = frame_extra_sizes(tregs->format);
struct sigcontext context;
int err = 0, sig = ksig->sig;
if (fsize < 0) {
pr_debug("setup_frame: Unknown frame format %#x\n",
tregs->format);
return -EFAULT;
}
frame = get_sigframe(ksig, sizeof(*frame) + fsize);
if (fsize)
err |= copy_to_user (frame + 1, regs + 1, fsize);
err |= __put_user(sig, &frame->sig);
err |= __put_user(tregs->vector, &frame->code); <====
err |= __put_user(&frame->sc, &frame->psc);
Happened during this stressor:
running --sigsegv 2 -t 300 --timestamp --no-rand-seed --times
stress-ng: 09:18:25.65 info: [3310] setting to a 300 second (5 mins,
0.00 secs) run per stressor
stress-ng: 09:18:25.80 info: [3310] dispatching hogs: 2 sigsegv
Timeout, server hobbes not responding.
I'll try with your second patch applied as well. I hadn't seen any
regressions with a patch adding 256 bytes of gap indiscriminately
though, so I'm sure that second patch itself is OK.
Cheers,
Michael