Re: [PATCH v3] fs/proc: Expose RSEQ configuration

Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> · Tue, 26 Jan 2021 11:25:47 -0800

On Tue, 26 Jan 2021 19:54:12 +0100 Piotr Figiel <figiel@xxxxxxxxxx> wrote:

> For userspace checkpoint and restore (C/R) some way of getting process
> state containing RSEQ configuration is needed.
> 
> There are two ways this information is going to be used:
>  - to re-enable RSEQ for threads which had it enabled before C/R
>  - to detect if a thread was in a critical section during C/R
> 
> Since C/R preserves TLS memory and addresses RSEQ ABI will be restored
> using the address registered before C/R.
> 
> Detection whether the thread is in a critical section during C/R is
> needed to enforce behavior of RSEQ abort during C/R. Attaching with
> ptrace() before registers are dumped itself doesn't cause RSEQ abort.
> Restoring the instruction pointer within the critical section is
> problematic because rseq_cs may get cleared before the control is
> passed to the migrated application code leading to RSEQ invariants not
> being preserved.
> 
> To achieve above goals expose the RSEQ structure address and the
> signature value with the new per-thread procfs file "rseq".

Using "/proc/<pid>/rseq" would be more informative.

>  fs/exec.c      |  2 ++
>  fs/proc/base.c | 22 ++++++++++++++++++++++
>  kernel/rseq.c  |  4 ++++

A Documentation/ update would be appropriate.

>  3 files changed, 28 insertions(+)
> 
> diff --git a/fs/exec.c b/fs/exec.c
> index 5d4d52039105..5d84f98847f1 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -1830,7 +1830,9 @@ static int bprm_execve(struct linux_binprm *bprm,
>  	/* execve succeeded */
>  	current->fs->in_exec = 0;
>  	current->in_execve = 0;
> +	task_lock(current);
>  	rseq_execve(current);
> +	task_unlock(current);

There's a comment over the task_lock() implementation which explains
what things it locks.  An update to that would be helpful.

> --- a/fs/proc/base.c
> +++ b/fs/proc/base.c
> @@ -662,6 +662,22 @@ static int proc_pid_syscall(struct seq_file *m, struct pid_namespace *ns,
>  
>  	return 0;
>  }
> +
> +#ifdef CONFIG_RSEQ
> +static int proc_pid_rseq(struct seq_file *m, struct pid_namespace *ns,
> +				struct pid *pid, struct task_struct *task)
> +{
> +	int res = lock_trace(task);
> +
> +	if (res)
> +		return res;
> +	task_lock(task);
> +	seq_printf(m, "%px %08x\n", task->rseq, task->rseq_sig);
> +	task_unlock(task);
> +	unlock_trace(task);
> +	return 0;
> +}

Do we actually need task_lock() for this purpose?  Would
exec_update_lock() alone be adequate and appropriate?