Re: core dump analysis, was Re: stack smashing detected

Finn Thain <fthain@xxxxxxxxxxxxxx> · Sun, 23 Apr 2023 17:46:07 +1000 (AEST)

On Wed, 19 Apr 2023, Michael Schmitz wrote:

I wonder what we'd see if we patched the kernel to log every user data 
write fault caused by a MOVEM instruction. I'll try to code that up.

If these instructions did always cause stack corruption on 030, I think 
we would have noticed long ago?

I think it probably was noticed long ago, in the form of rare userland 
crashes on 68030. But it was probably never reported because the actual 
culprit is too distant from the symptoms.

But I take your point -- signal delivery seems to be crucial. Would it be 
difficult to skip signal delivery following a bus error? Perhaps there's 
no need to try that experiment, as we know what would happen.

I will take a look at your modified test program and try to use the output 
to figure out the stack gymnastics.

IIUC, there are two RTEs following the page fault. The first one runs the 
signal handler, the second one resumes the MOVEM that faulted. Maybe we'll 
have to intercept the latter (at do_sigreturn() perhaps?) and examine that 
exception frame.