On Fri, May 21, 2021 at 11:33:14PM +0200, Jann Horn wrote: > > SYSCALL_DEFINE2(umcg_wake, u32, flags, u32, next_tid) > > { > > - return -ENOSYS; > > + struct umcg_task_data *next_utd; > > + struct task_struct *next; > > + int ret = -EINVAL; > > + > > + if (!next_tid) > > + return -EINVAL; > > + if (flags) > > + return -EINVAL; > > + > > + next = find_get_task_by_vpid(next_tid); > > + if (!next) > > + return -ESRCH; > > + rcu_read_lock(); > > Wouldn't it be more efficient to replace the last 4 lines with the following? > > rcu_read_lock(); > next = find_task_by_vpid(next_tid); > if (!next) { > err = -ESRCH; > goto out; > } This wakeup crud needs to modify the umcg->state, which is a user variable. That can't be done under RCU. Weirdly the proposed code doesn't actually do any of that for undocumented raisins :/ > Then you don't need to use refcounting here... > > > + next_utd = rcu_dereference(next->umcg_task_data); > > + if (!next_utd) > > + goto out; > > + > > + if (!READ_ONCE(next_utd->in_wait)) { > > + ret = -EAGAIN; > > + goto out; > > + } > > + > > + ret = wake_up_process(next); > > + put_task_struct(next); > > ... and you'd be able to drop this put_task_struct(), too. > > > + if (ret) > > + ret = 0; > > + else > > + ret = -EAGAIN; > > + > > +out: > > + rcu_read_unlock(); > > + return ret; > > } > > > > /** > > @@ -139,5 +325,44 @@ SYSCALL_DEFINE2(umcg_wake, u32, flags, u32, next_tid) > > SYSCALL_DEFINE4(umcg_swap, u32, wake_flags, u32, next_tid, u32, wait_flags, > > const struct __kernel_timespec __user *, timeout) > > { > > - return -ENOSYS; > > + struct umcg_task_data *curr_utd; > > + struct umcg_task_data *next_utd; > > + struct task_struct *next; > > + int ret = -EINVAL; > > + > > + rcu_read_lock(); > > + curr_utd = rcu_dereference(current->umcg_task_data); > > + > > + if (!next_tid || wake_flags || wait_flags || !curr_utd) > > + goto out; > > + > > + if (timeout) { > > + ret = -EOPNOTSUPP; > > + goto out; > > + } > > + > > + next = find_get_task_by_vpid(next_tid); > > + if (!next) { > > + ret = -ESRCH; > > + goto out; > > + } > > There isn't any type of access check here, right? Any task can wake up > any other task? That feels a bit weird to me - and if you want to keep > it as-is, it should probably at least be documented that any task on > the system can send you spurious wakeups if you opt in to umcg. You can only send wakeups to other UMCG thingies, per the next->umcg_task_data check below. That said.. > In contrast, shared futexes can avoid this because they get their > access control implicitly from the VMA. Every task must expect spurious wakups at all times, always (for TASK_NORMAL wakeups that is). There's plenty ways to generate them. > > + next_utd = rcu_dereference(next->umcg_task_data); > > + if (!next_utd) { > > + ret = -EINVAL; > > + goto out; > > + }