Re: core dump analysis, was Re: stack smashing detected

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Finn,

On 23/04/23 19:46, Finn Thain wrote:
On Wed, 19 Apr 2023, Michael Schmitz wrote:

I wonder what we'd see if we patched the kernel to log every user data
write fault caused by a MOVEM instruction. I'll try to code that up.
If these instructions did always cause stack corruption on 030, I think
we would have noticed long ago?

I think it probably was noticed long ago, in the form of rare userland
crashes on 68030. But it was probably never reported because the actual
culprit is too distant from the symptoms.

But I take your point -- signal delivery seems to be crucial. Would it be
difficult to skip signal delivery following a bus error? Perhaps there's
no need to try that experiment, as we know what would happen.
Shouldn't be too hard, see my other mail.
I will take a look at your modified test program and try to use the output
to figure out the stack gymnastics.

IIUC, there are two RTEs following the page fault. The first one runs the
signal handler, the second one resumes the MOVEM that faulted. Maybe we'll
have to intercept the latter (at do_sigreturn() perhaps?) and examine that
exception frame.

There's no second RTE as far as I can see - upon return from buserr_c, the asm buserr handler jumps to ret_from_exception. Seeing as the bus error was taken from user space, ret_from_exception proceeds to resume_userspace, and seeing the task info flags field non-zero, jumps to exit_work where with signal pending, a jump to do_signal_return is taken and the signal handler is set up (frame setup to return through the sigreturn trampoline, pc set to hander etc). No rte anywhere on that path. After setting up for the signal handler, we return to resume_userspace and no further signals pending, hit RESTORE_ALL which restores registes from the pt_regs struct on the kernel stack, and has the rte instruction at the end. We had earlier set usp to the signal frame and pc to the signal handler, so that is now run after resuming user mode after the rte instruction.

Exiting from the signal handler, sys_sigreturn runs and cleans up the user stack, then returns to the instruction at the pc from the saved exception frame that got us into kernel mode in the first instance. This is the moment the moveml instruction resumes.

There should be no difference between ret_from_exception (after buserr) jumping to RESTORE_ALL directly (with exception frame still on the kernel stack from the bus error exception) and doing so after the detour through signal hander setup, signal handler and sys_sigreturn cleanup. If the exception frame on the stack was any different from what it ought to be, rte would fail and raise a format error exception.

If the frame was different from that needed to complete the bus error exception, f.e. one from a trap exception, we'd fail to resume that moveml instruction and do something else instead. Hmmm - that's an interesting fault mode... might explain why a3 wasn't saved as it ought to have been? Can we 'poison' the user stack area that will be used for register save upon rec() entry with some other patterns to prove that moveml sometimes does not complete after the bus error?

Cheers,

    Michael







[Index of Archives]     [Video for Linux]     [Yosemite News]     [Linux S/390]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux