Hi Linus,
On 22/06/21 11:14 am, Linus Torvalds wrote:
On Mon, Jun 21, 2021 at 12:45 PM Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote:
Looks like sys_exit() and do_group_exit() would be the two places to
do it (do_group_exit() would handle the signal case and
sys_group_exit()).
Maybe... I'm digging through that pile right now, will follow up when
I get a reasonably complete picture
We might have another possible way to solve this:
(a) make it the rule that everybody always saves the full (integer)
register set in pt_regs
(b) make m68k just always create that switch-stack for all system
calls (it's really not that big, I think it's like six words or
something)
Turns out that is harder than it looked at first glance (at least for me).
All syscalls that _do_ save the switch stack are currently called
through wrappers which pull the syscall arguments out of the saved
pt_regs on the stack (pushing the switch stack after the SAVE_ALL saved
stuff buries the syscall arguments on the stack, see comment about
m68k_clone(). We'd have to push the switch stack _first_ when entering
system_call to leave the syscall arguments in place, but that will
require further changes to the syscall exit path (currently shared with
the interrupt exit path). Not to mention the register offset
calculations in arch/m68k/kernel/ptrace.c, and perhaps a few other
dependencies that don't come to mind immediately.
We have both pt_regs and switch_stack in uapi/asm/ptrace.h, but the
ordering of the two is only mentioned in a comment. Can we reorder them
on the stack, as long as we don't change the struct definitions proper?
This will take a little more time to work out and test - certainly not
before the weekend. I'll send a corrected version of my debug patch
before that.
Cheers,
Michael
(c) admit that alpha is broken, but nobody really cares
In the meanwhile, do kernel/kthread.c uses look even remotely sane?
Intentional - sure, but it really looks wrong to use thread exit code
as communication channel there...
I really doubt that it is even "intentional".
I think it's "use some errno as a random exit code" and nobody ever
really thought about it, or thought about how that doesn't really
work. People are used to the error numbers, not thinking about how
do_exit() doesn't take an error number, but a signal number (and an
8-bit positive error code in bits 8-15).
Because no, it's not even remotely sane.
I think the do_exit(-EINTR) could be do_exit(SIGINT) and it would make
more sense. And the -ENOMEM might be SIGBUS, perhaps.
It does look like the usermode-helper code does save the exit code
with things like
kernel_wait(pid, &sub_info->retval);
and I see call_usermodehelper_exec() doing
retval = sub_info->retval;
and treating it as an error code. But I think those have never been
tested with that (bogus) exit code thing from kernel_wait(), because
it wouldn't have worked. It has only ever been tested with the (real)
exit code things like
if (pid < 0) {
sub_info->retval = pid;
which does actually assign a negative error code to it.
So I think that
kernel_wait(pid, &sub_info->retval);
line is buggy, and should be something like
int wstatus;
kernel_wait(pid, &wstatus);
sub_info->retval = WEXITSTATUS(wstatus) ? -EINVAL : 0;
or something.
Linus