Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> writes:
On Fri, Jun 11, 2021 at 2:40 PM Eric W. Biederman <ebiederm@xxxxxxxxxxxx> wrote:
Looking at copy_thread it looks like at least on alpha we are dealing
with a structure that defines all of the registers in copy_thread.
On the target side, yes.
On the _source_ side, the code does
struct pt_regs *regs = current_pt_regs();
and that's the part that means that fork() and related functions need
to have done that DO_SWITCH_STACK(), so that they have the full
register set to be copied.
Otherwise it would copy random contents from the source stack.
But that
if (unlikely(p->flags & (PF_KTHREAD | PF_IO_WORKER))) {
ends up protecting us, and the code never uses that set of source
registers for the io worker threads.
The test in copy_thread. That isn't the case I am worried about.
So io_uring looks fine on alpha. I didn't check m68k and friends, but
I think they have the same thing going.
As I have read through the code more I don't think so.
The code paths I am worried about are:
ret_from_kernel_thread
io_wqe_worker
get_signal
do_coredump
ptrace_stop
ret_from_kernel_thread
io_sq_thread
get_signal
do_coredump
ptrace_stop
As I understand the code the new thread created by create_thread
initially has a full complement of registers, and then is started
by alpha_switch_to:
.align 4
.globl alpha_switch_to
.type alpha_switch_to, @function
.cfi_startproc
alpha_switch_to:
DO_SWITCH_STACK
call_pal PAL_swpctx
lda $8, 0x3fff
UNDO_SWITCH_STACK
bic $sp, $8, $8
mov $17, $0
ret
.cfi_endproc
.size alpha_switch_to, .-alpha_switch_to
The alpha_switch_to will remove the extra registers from the stack and
then call ret which if I understand alpha assembly correctly is
equivalent to jumping to where $26 points. Which is
ret_from_kernel_thread (as setup by copy_thread).
Which leaves ret_from_kernel_thread and everything it calls without
the extra context saved on the stack.
I am still trying to understand how we get registers populated at a
fixed offset on the stack during schedule. As it looks like switch_to
assumes the stack pointer is in the proper location.
It looks like we just need something like this to cover the userspace
side of exit.
Looks correct to me. Except I think you could just use "fork_like()"
instead of creating a new (and identical) "exit_like()" macro.
But I really wish we had some way to test and trigger this so that we
wouldn't get caught on this before. Something in task_pt_regs() that
catches "this doesn't actually work" and does a WARN_ON_ONCE() on the
affected architectures?
I think that would require pushing an extra magic value in SWITCH_STACK
and not just popping it but deliberately changing that value in
UNDO_SWITCH_STACK. Basically stack canaries.
I don't see how we could do it in an arch independent way though.
No, I think you're right. There's no obvious generic solution to it,
and once we look at arch-specific ones we're vback to "just alpha,
m68k and nios needs this or cares" and tonce you're there you might as
well just fix it.
ia64 has soem "fast system call" model with limited registers too, but
I think that's limited to just a few very special system calls (ie it
does the reverse of what alpha does: alpha does the fast case by
default, and then marks fork/vfork/clone as special).
I wonder if the arch specific solution should be to move the registers
to a fixed location in task_struct (perhaps thread_struct ) so that the
same patterns can apply across all architectures and we don't get
surprises at all.
What appears to be unique about alpha, m68k, and nios is that
space is not always reserved for all of the registers, so we can't
always count on them being saved after a task switch.
Eric