On Tue, Jan 26, 2021 at 03:58:46PM -0500, Mathieu Desnoyers wrote: > ----- On Jan 26, 2021, at 1:54 PM, Piotr Figiel figiel@xxxxxxxxxx wrote: > [...] > > diff --git a/kernel/rseq.c b/kernel/rseq.c > > index a4f86a9d6937..6aea67878065 100644 > > --- a/kernel/rseq.c > > +++ b/kernel/rseq.c > > @@ -322,8 +322,10 @@ SYSCALL_DEFINE4(rseq, struct rseq __user *, rseq, u32, > > rseq_len, > > ret = rseq_reset_rseq_cpu_id(current); > > if (ret) > > return ret; > > + task_lock(current); > > current->rseq = NULL; > > current->rseq_sig = 0; > > + task_unlock(current); > > return 0; > > } > > > > @@ -353,8 +355,10 @@ SYSCALL_DEFINE4(rseq, struct rseq __user *, rseq, u32, > > rseq_len, > > return -EINVAL; > > if (!access_ok(rseq, rseq_len)) > > return -EFAULT; > > + task_lock(current); > > current->rseq = rseq; > > current->rseq_sig = sig; > > + task_unlock(current); > > So AFAIU, the locks are there to make sure that whenever a user-space > thread reads that state through that new /proc file ABI, it observes > coherent "rseq" vs "rseq_sig" values. Yes, that was the intention. > However, I'm not convinced this is the right approach to consistency > here. > > Because if you add locking as done here, you ensure that the /proc > file reader sees coherent values, but between the point where those > values are read from kernel-space, copied to user-space, and then > acted upon by user-space, those can very well have become outdated if > the observed process runs concurrently. You are right here, but I think this comment is valid for most of the process information exported via procfs. The user can almost always make a time of check/time of use race when interacting with procfs. I agree that the locking added in v3 doesn't help much, but at least it does provide a well defined answer: i.e. at least in some point of time the effective configuration was as returned. It makes it a bit easier to document and reason about the file contents, compared to the inconsistent version. > So my understanding here is that the only non-racy way to effectively > use those values is to either read them from /proc/self/* (from the > thread owning the task struct), or to ensure that the thread is > stopped/frozen while the read is done. Constraining this solely to the owning thread I think is a bit too limiting. I think we could limit it to stopped threads but I don't think it eliminates the potential of time of check/time of use races for the user. In this shape as in v3 - it's up to the user to decide if there is a relevant risk of a race, if it's unwanted then the thread can be stopped with e.g. ptrace, cgroup freeze or SIGSTOP. Best regards, Piotr.