Re: signal delivery, was Re: reliable reproducer

Michael Schmitz <schmitzmic@xxxxxxxxx> · Wed, 26 Apr 2023 16:00:40 +1200

Hi Finn,

Am 26.04.2023 um 14:02 schrieb Finn Thain:
On Wed, 26 Apr 2023, Michael Schmitz wrote:

Thanks - we had seen evidence that a bus error generated mid-instruction
did leave the USP at the address where the bus fault happened (not
before the instruction started, neither what it would have been once the
instruction completed), and the operation did not complete normally
after the bus error (at least the value/address seen in the exception
frame not stored).

I'm afraid I still don't fully understand how and why the user stack
(rather than the supervisor stack) gets used for processing the exception
frame.

The kernel stack would not be accessible to the signal handler which 
must run in process context (i.e. user space).

The exception frame is copied to the signal frame for informational 
purposes only (such as examination of processor state when the signal 
was taken - not too useful for SIGCHLD but could be used to interpret 
SIGSEGV).

Finn had also demonstrated that skipping signal delivery on bus errors
abolishes the stack corruption.  Your patch achieves the same objective
in a different way, so I'm sure this will work as well.

I had thought the 030 could resume the interrupted instruction using the
information from the exception frame - and that does appear to work in
all other cases except where signal delivery gets in the way, and it
also works if moving the exception frame a little bit further down the
stack. So our treatment of the bus error exception frame during signal
delivery appears to be incorrect.

It seems I got confused about user and kernel stack there myself. And 
managed to confuse almost everyone else about this bug. Apologies for 
the incessant noise.

What matters for the return from exception is an intact frame on the 
kernel stack. Anything we do on the user stack (mucking around with the 
offset the sigframe is set up at, copying siginfo, ucontext or 
sigcontext plus exception frame extra) does not change the kernel stack 
one whit.

The mangle_kernel_stack stuff is needed because sys_sigreturn will place 
another exception frame on the kernel stack (a four word frame) that 
needs to be replaced by the bus error exception frame (or any other 
frame that caused the kernel mode entry prior to signal delivery) before 
finally returning from the bus error exception.

Only at that time will the movel instruction that took the bus fault 
resume (and complete its writes correctly, I hope).

Our problem may be that, if we take the signal too late and our main 
process inspects the stack that has been left partially saved only (due 
to the bus error processing still in-flight), we appear to be in 
trouble. After completing sys_sigreturn, everything will be OK.

I can see this cause the stack error in the test case. Not sure it also 
applies to the dash case ...

Wouldn't that depend on the exception frame format? Perhaps it is unsafe
to treat any format 0xB exception frame in the way we do. If so, what do
we do about address error exceptions, which are to produce SIGBUS? The
Programmers Reference Manual says "a long bus fault stack frame may be
generated" in this case.

We don't handle access errors (beyond terminating the offending process).

I hope this makes a little more sense now...

Cheers,

	Michael