Re: signal delivery, was Re: reliable reproducer

Michael Schmitz <schmitzmic@xxxxxxxxx> · Sat, 29 Apr 2023 18:01:02 +1200

Hi Finn,

Am 29.04.2023 um 17:03 schrieb Finn Thain:
On Sat, 29 Apr 2023, Michael Schmitz wrote:

Am 29.04.2023 um 12:28 schrieb Finn Thain:

Right. If we fix this in the signal handling code, we take care of
address errors as well, which was my concern with Andreas' patch. We
can do what do_page_fault() does and assume the worst (256 bytes?).

Well, we could do that if we could be certain this does not cause a
memory leak in some way. The reason I bring this up is that I've just
seen the kernel that I'd used to run the latest test cases (which
inserts a 20 byte gap only!) run amok terminating pretty much my entire
user space because it ran out of memory. Never seen the like of that.

If the test program ran out of stack space it would not trigger the OOM
killer. So that incident probably has something to do with upgrading your
kernel (?)

Might be, but this has all been on m68k v6.3-rc7, and I hadn't seen the 
memory squeeze before there. I'll have to run a few hundred of the test 
case on an unpatched v6.3-rc7 and on the one with the minimal frame gap 
to be sure though.

Anyway, I agree that stkadj would need to account for the gap, as you
pointed out earlier.

Not sure about that anymore - mangle_kernel_stack() does not even use 
stkadj to shift contents on the kernel stack (after restoring the 
exception frame from the signal stack, but it uses the start address of 
the frame for that copy operation, and uses a local buffer to move it 
from user space to kernel space). It uses the extra frame size from the 
exception frame directly.

stkadj is the offset of the replacement exception frame on the kernel 
stack. The replacement frame gets us into the user space signal handler 
instead of completing the exception right away. stkadj is used to skip 
that replacement exception frame used for the signal handler on the 
final rte (after a trip through sys_sigreturn to copy the original 
exception frame back on the kernel stack).

The offset we use for he signal stack on the user stack does not matter 
here at all.

Or so my limited understanding...

I believe we can use USP to get a worst case estimate for the future
extent of the user stack. ...

What is the most data a moveml <...>,sp@- can take? If that's not too
much, a constant offset for the signal stack in case of format b frames
on 020/030 might be easiest.

I think it's 64 bytes (16 registers). But we also have to consider all of
the other instructions that may write to the stack. There's probably a
reason why do_page_fault() picked a 256 byte gap (?)

That's not used as a gap, just to catch any user access below the user 
stack pointer.

But we need to find something that works in the general case (and then
analyze the performance impact it might have in stack and signal heavy
applications - I might have mentioned that before, but your equivalent
to Andreas' patch seemed quite a bit slower in the test case than when
signals were allowed after format b bus faults. Interrupt latency, most
likely).

The alternative is to use more stack memory, which means marginally more
paging. Choose your poison...

Yes - I'll have to run a few benchmarks to see which I'd prefer.

In the meantime, I'll send what I have at present as RFC.

Cheers,

	Michael