On Fri, Feb 10 2023 at 02:25, Gregory Price wrote: > > As the ABI of these intercepted syscalls is unknown to Linux, these > -syscalls are not instrumentable via ptrace or the syscall tracepoints. > +syscalls are not instrumentable via ptrace or the syscall tracepoints, > +however an interfaces to suspend, checkpoint, and restore syscall user > +dispatch configuration has been added to ptrace to assist userland > +checkpoint/restart software. The above is incomprehensible word salad to me. Once the ptrace functions are there then they can be used independent of CRIU, no? > + * struct ptrace_sud_config - Per-task configuration for SUD > + * @mode: One of PR_SYS_DISPATCH_ON or PR_SYS_DISPATCH_OFF > + * @selector: Tracee's user virtual address of SUD selector > + * @offset: SUD exclusion area (virtual address) > + * @len: Length of SUD exclusion area > + * > + * Used to get/set the syscall user dispatch configuration for tracee. > + * process. Selector is optional (may be NULL), and if invalid will produce > + * a SIGSEGV in the tracee upon first access. > + * > + * If mode is PR_SYS_DISPATCH_ON, syscall dispatch will be enabled. If > + * PR_SYS_DISPATCH_OFF, syscall dispatch will be disabled and all other > + * parameters must be 0. The value in *selector (if not null), also determines > + * whether syscall dispatch will occur. > + * > + * The SUD Exclusion area described by offset/len is the virtual address space > + * from which syscalls will not produce a user dispatch. > + */ > +struct ptrace_sud_config { > + __u64 mode; > + __s8 *selector; How is this correct for a 32bit ptracer running on a 64bit kernel? Aside of not wiring up the compat syscall without any argumentation in the changelog. > --- a/kernel/entry/syscall_user_dispatch.c > +++ b/kernel/entry/syscall_user_dispatch.c This section: > -int set_syscall_user_dispatch(unsigned long mode, unsigned long offset, > - unsigned long len, char __user *selector) > +static int task_set_syscall_user_dispatch(struct task_struct *task, unsigned long mode, > + unsigned long offset, unsigned long len, > + char __user *selector) > { > switch (mode) { > case PR_SYS_DISPATCH_OFF: > @@ -94,15 +96,60 @@ int set_syscall_user_dispatch(unsigned long mode, unsigned long offset, > return -EINVAL; > } > > - current->syscall_dispatch.selector = selector; > - current->syscall_dispatch.offset = offset; > - current->syscall_dispatch.len = len; > - current->syscall_dispatch.on_dispatch = false; > + task->syscall_dispatch.selector = selector; > + task->syscall_dispatch.offset = offset; > + task->syscall_dispatch.len = len; > + task->syscall_dispatch.on_dispatch = false; > > if (mode == PR_SYS_DISPATCH_ON) > - set_syscall_work(SYSCALL_USER_DISPATCH); > + set_task_syscall_work(task, SYSCALL_USER_DISPATCH); > else > - clear_syscall_work(SYSCALL_USER_DISPATCH); > + clear_task_syscall_work(task, SYSCALL_USER_DISPATCH); > > return 0; > } > + > +int set_syscall_user_dispatch(unsigned long mode, unsigned long offset, > + unsigned long len, char __user *selector) > +{ > + return task_set_syscall_user_dispatch(current, mode, offset, len, selector); > +} until here want's to be a seperate preparatory patch. > +++ b/tools/testing/selftests/ptrace/get_set_sud.c > + child = fork(); > + ASSERT_GE(child, 0); > + if (child == 0) { > + ASSERT_EQ(0, sys_ptrace(PTRACE_TRACEME, 0, 0, 0)) { > + TH_LOG("PTRACE_TRACEME: %m"); > + } > + kill(getpid(), SIGSTOP); > + sleep(2); The purpose of this sleep is what? > + _exit(1); > + } > + > + waitpid(child, &status, 0); > + > + config.mode = PR_SYS_DISPATCH_ON; > + config.selector = (void*)0xDEADBEEF; > + config.offset = 0x12345678; > + config.len = 0x87654321; What's the purpose of these magic numbers? memset(&config, 0xff,...) is sufficient, no? Thanks, tglx