On Mon, Feb 13, 2023 at 09:26:21PM +0100, Thomas Gleixner wrote: > On Fri, Feb 10 2023 at 02:25, Gregory Price wrote: > > > > As the ABI of these intercepted syscalls is unknown to Linux, these > > -syscalls are not instrumentable via ptrace or the syscall tracepoints. > > +syscalls are not instrumentable via ptrace or the syscall tracepoints, > > +however an interfaces to suspend, checkpoint, and restore syscall user > > +dispatch configuration has been added to ptrace to assist userland > > +checkpoint/restart software. > > The above is incomprehensible word salad to me. Once the ptrace > functions are there then they can be used independent of CRIU, no? > The verbiage here is half-baked, I'll just break out a separate paragraph to explain better (or drop entirely, if that's preferable). Since SUD isn't really designed for anything other than syscall emulation, there's not much of a use for these get/set interfaces outside the context of checkpoint/restart. GDB and friends are already perfectly happy and capable of debugging SUD enabled software in the absense of these interfaces and have no need to disable the feature. > > + * struct ptrace_sud_config - Per-task configuration for SUD > > + * @mode: One of PR_SYS_DISPATCH_ON or PR_SYS_DISPATCH_OFF > > + * @selector: Tracee's user virtual address of SUD selector > > + * @offset: SUD exclusion area (virtual address) > > + * @len: Length of SUD exclusion area > > + * > > + * Used to get/set the syscall user dispatch configuration for tracee. > > + * process. Selector is optional (may be NULL), and if invalid will produce > > + * a SIGSEGV in the tracee upon first access. > > + * > > + * If mode is PR_SYS_DISPATCH_ON, syscall dispatch will be enabled. If > > + * PR_SYS_DISPATCH_OFF, syscall dispatch will be disabled and all other > > + * parameters must be 0. The value in *selector (if not null), also determines > > + * whether syscall dispatch will occur. > > + * > > + * The SUD Exclusion area described by offset/len is the virtual address space > > + * from which syscalls will not produce a user dispatch. > > + */ > > +struct ptrace_sud_config { > > + __u64 mode; > > + __s8 *selector; > > How is this correct for a 32bit ptracer running on a 64bit kernel? Aside > of not wiring up the compat syscall without any argumentation in the > changelog. > you're right, these would need to be unsigned long/pointers, i will take a closer look at how ptrace manages this elsewhere and come back. > > > --- a/kernel/entry/syscall_user_dispatch.c > > +++ b/kernel/entry/syscall_user_dispatch.c > > This section: > > > -int set_syscall_user_dispatch(unsigned long mode, unsigned long offset, > > - unsigned long len, char __user *selector) > > +static int task_set_syscall_user_dispatch(struct task_struct *task, unsigned long mode, > > + unsigned long offset, unsigned long len, > > + char __user *selector) > > { > > switch (mode) { > > case PR_SYS_DISPATCH_OFF: > > @@ -94,15 +96,60 @@ int set_syscall_user_dispatch(unsigned long mode, unsigned long offset, > > return -EINVAL; > > } > > > > - current->syscall_dispatch.selector = selector; > > - current->syscall_dispatch.offset = offset; > > - current->syscall_dispatch.len = len; > > - current->syscall_dispatch.on_dispatch = false; > > + task->syscall_dispatch.selector = selector; > > + task->syscall_dispatch.offset = offset; > > + task->syscall_dispatch.len = len; > > + task->syscall_dispatch.on_dispatch = false; > > > > if (mode == PR_SYS_DISPATCH_ON) > > - set_syscall_work(SYSCALL_USER_DISPATCH); > > + set_task_syscall_work(task, SYSCALL_USER_DISPATCH); > > else > > - clear_syscall_work(SYSCALL_USER_DISPATCH); > > + clear_task_syscall_work(task, SYSCALL_USER_DISPATCH); > > > > return 0; > > } > > + > > +int set_syscall_user_dispatch(unsigned long mode, unsigned long offset, > > + unsigned long len, char __user *selector) > > +{ > > + return task_set_syscall_user_dispatch(current, mode, offset, len, selector); > > +} > > until here want's to be a seperate preparatory patch. > I had considered this, but didn't know what was preferable, given that there's not much reason to create the functions outside the context of this patch. Will do. > > +++ b/tools/testing/selftests/ptrace/get_set_sud.c > > + child = fork(); > > + ASSERT_GE(child, 0); > > + if (child == 0) { > > + ASSERT_EQ(0, sys_ptrace(PTRACE_TRACEME, 0, 0, 0)) { > > + TH_LOG("PTRACE_TRACEME: %m"); > > + } > > + kill(getpid(), SIGSTOP); > > + sleep(2); > > The purpose of this sleep is what? > artifact of taking other tests as an outline, will drop it and rerun. > > + _exit(1); > > + } > > + > > + waitpid(child, &status, 0); > > + > > + config.mode = PR_SYS_DISPATCH_ON; > > + config.selector = (void*)0xDEADBEEF; > > + config.offset = 0x12345678; > > + config.len = 0x87654321; > > What's the purpose of these magic numbers? memset(&config, 0xff,...) is > sufficient, no? > > Thanks, > > tglx Nothing, will drop. ~Gregory