----- On Jan 14, 2021, at 1:54 PM, Piotr Figiel figiel@xxxxxxxxxx wrote: Added PeterZ, Paul and Boqun to CC. They are also listed as maintainers of rseq. Please CC them in your next round of patches. > For userspace checkpoint and restore (C/R) some way of getting process > state containing RSEQ configuration is needed. > > There are two ways this information is going to be used: > - to re-enable RSEQ for threads which had it enabled before C/R > - to detect if a thread was in a critical section during C/R > > Since C/R preserves TLS memory and addresses RSEQ ABI will be restored > using the address registered before C/R. Indeed, if the process goes through a checkpoint/restore while within a rseq c.s., that critical section should abort. Given that it's only the restored process which resumes user-space execution, there should be some way to ensure that the rseq tls pointer is restored before that thread goes back to user-space, or some way to ensure the rseq TLS is registered before that thread returns to the saved instruction pointer. How do you plan to re-register the rseq TLS for each thread upon restore ? I suspect you move the return IP to the abort either at checkpoint or restore if you detect that the thread is running in a rseq critical section. > > Detection whether the thread is in a critical section during C/R is > needed to enforce behavior of RSEQ abort during C/R. Attaching with > ptrace() before registers are dumped itself doesn't cause RSEQ abort. Right, because the RSEQ abort is only done when going back to user-space, and AFAIU the checkpointed process will cease to exist, and won't go back to user-space, therefore bypassing any RSEQ abort. > Restoring the instruction pointer within the critical section is > problematic because rseq_cs may get cleared before the control is > passed to the migrated application code leading to RSEQ invariants not > being preserved. The commit message should state that both the per-thread rseq TLS area address and the signature are dumped within this new proc file. > > Signed-off-by: Piotr Figiel <figiel@xxxxxxxxxx> > > --- > > v2: > - fixed string formatting for 32-bit architectures > > v1: > - https://lkml.kernel.org/r/20210113174127.2500051-1-figiel@xxxxxxxxxx > > --- > fs/proc/base.c | 21 +++++++++++++++++++++ > 1 file changed, 21 insertions(+) > > diff --git a/fs/proc/base.c b/fs/proc/base.c > index b3422cda2a91..7cc36a224b8b 100644 > --- a/fs/proc/base.c > +++ b/fs/proc/base.c > @@ -662,6 +662,21 @@ static int proc_pid_syscall(struct seq_file *m, struct > pid_namespace *ns, > > return 0; > } > + > +#ifdef CONFIG_RSEQ > +static int proc_pid_rseq(struct seq_file *m, struct pid_namespace *ns, > + struct pid *pid, struct task_struct *task) > +{ > + int res = lock_trace(task); AFAIU lock_trace prevents concurrent exec() from modifying the task's content. What prevents a concurrent rseq register/unregister to be executed concurrently with proc_pid_rseq ? > + > + if (res) > + return res; > + seq_printf(m, "%tx %08x\n", (ptrdiff_t)((uintptr_t)task->rseq), I wonder if all those parentheses are needed. Wouldn't it be enough to have: (ptrdiff_t)(uintptr_t)task->rseq ? Thanks, Mathieu > + task->rseq_sig); > + unlock_trace(task); > + return 0; > +} > +#endif /* CONFIG_RSEQ */ > #endif /* CONFIG_HAVE_ARCH_TRACEHOOK */ > > /************************************************************************/ > @@ -3182,6 +3197,9 @@ static const struct pid_entry tgid_base_stuff[] = { > REG("comm", S_IRUGO|S_IWUSR, proc_pid_set_comm_operations), > #ifdef CONFIG_HAVE_ARCH_TRACEHOOK > ONE("syscall", S_IRUSR, proc_pid_syscall), > +#ifdef CONFIG_RSEQ > + ONE("rseq", S_IRUSR, proc_pid_rseq), > +#endif > #endif > REG("cmdline", S_IRUGO, proc_pid_cmdline_ops), > ONE("stat", S_IRUGO, proc_tgid_stat), > @@ -3522,6 +3540,9 @@ static const struct pid_entry tid_base_stuff[] = { > &proc_pid_set_comm_operations, {}), > #ifdef CONFIG_HAVE_ARCH_TRACEHOOK > ONE("syscall", S_IRUSR, proc_pid_syscall), > +#ifdef CONFIG_RSEQ > + ONE("rseq", S_IRUSR, proc_pid_rseq), > +#endif > #endif > REG("cmdline", S_IRUGO, proc_pid_cmdline_ops), > ONE("stat", S_IRUGO, proc_tid_stat), > -- > 2.30.0.284.gd98b1dd5eaa7-goog -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com