On Wed, Nov 27, 2019 at 9:48 PM Jens Axboe <axboe@xxxxxxxxx> wrote: > On 11/27/19 12:23 PM, Jann Horn wrote: > > On Wed, Nov 27, 2019 at 6:11 AM Jens Axboe <axboe@xxxxxxxxx> wrote: > >> I posted this a few weeks back, took another look at it and refined it a > >> bit. I'd like some input on the viability of this approach. > >> > >> A new signalfd setup flag is added, SFD_TASK. This is only valid if used > >> with SFD_CLOEXEC. If set, the task setting up the signalfd descriptor is > >> remembered in the signalfd context, and will be the one we use for > >> checking signals in the poll/read handlers in signalfd. > >> > >> This is needed to make signalfd useful with io_uring and aio, of which > >> the former in particular has my interest. > >> > >> I _think_ this is sane. To prevent the case of a task clearing O_CLOEXEC > >> on the signalfd descriptor, forking, and then exiting, we grab a > >> reference to the task when we assign it. If that original task exits, we > >> catch it in signalfd_flush() and ensure waiters are woken up. > > > > Mh... that's not really reliable, because you only get ->flush() from > > the last exiting thread (or more precisely, the last exiting task that > > shares the files_struct). > > > > What is your goal here? To have a reference to a task without keeping > > the entire task_struct around in memory if someone leaks the signalfd > > to another process - basically like a weak pointer? If so, you could > > store a refcounted reference to "struct pid" instead of a refcounted > > reference to the task_struct, and then do the lookup of the > > task_struct on ->poll and ->read (similar to what procfs does). > > Yeah, I think that works out much better (and cleaner). How about this, > then? Follows your advice and turns it into a struct pid instead. I > don't particularly like the -ESRCH in dequeue and setup, what do you > think? For poll, POLLERR seems like a prudent choice. -ESRCH may be kinda weird, but I also can't think of anything better... and it does describe the problem pretty accurately: The task whose signal state you're trying to inspect is gone. I went through the list of errnos, and everything else sounded more weird... One more thing, though: We'll have to figure out some way to invalidate the fd when the target goes through execve(), in particular if it's a setuid execution. Otherwise we'll be able to just steal signals that were intended for the other task, that's probably not good. So we should: a) prevent using ->wait() on an old signalfd once the task has gone through execve() b) kick off all existing waiters c) most importantly, prevent ->read() on an old signalfd once the task has gone through execve() We probably want to avoid using the cred_guard_mutex here, since it is quite broad and has some deadlocking issues; it might make sense to put the update of ->self_exec_id in fs/exec.c under something like the siglock, and then for a) and c) we can check whether the ->self_exec_id changed while holding the siglock, and for b) we can add a call to signalfd_cleanup() after the ->self_exec_id change. > +static void signalfd_put_task(struct signalfd_ctx *ctx, struct task_struct *tsk) > +{ > + if (ctx->task_pid) > + put_task_struct(tsk); > +} > + > +static struct task_struct *signalfd_get_task(struct signalfd_ctx *ctx) > +{ > + if (ctx->task_pid) > + return get_pid_task(ctx->task_pid, PIDTYPE_PID); > + > + return current; > +} This works, and I guess it's a question of coding style... but I'd kinda prefer to do the refcount operation in both cases, so that the semantics of the returned reference are simply "holds a reference" instead of "either holds a reference or borrows from current depending on ctx->task_pid". But if you feel strongly about it, feel free to keep it as-is. [...] > - add_wait_queue(¤t->sighand->signalfd_wqh, &wait); > + add_wait_queue(&tsk->sighand->signalfd_wqh, &wait); > for (;;) { > set_current_state(TASK_INTERRUPTIBLE); > - ret = dequeue_signal(current, &ctx->sigmask, info); > + ret = dequeue_signal(tsk, &ctx->sigmask, info); > if (ret != 0) > break; > if (signal_pending(current)) { > ret = -ERESTARTSYS; > break; > } > - spin_unlock_irq(¤t->sighand->siglock); > + spin_unlock_irq(&tsk->sighand->siglock); > schedule(); Should we be dropping the reference to the task before schedule() and re-acquiring it afterwards so that if we're blocked on a signalfd read and then the corresponding task dies, the refcount can drop to zero and we can get woken up? Probably doesn't matter, but seems a bit cleaner to me. > - spin_lock_irq(¤t->sighand->siglock); > + spin_lock_irq(&tsk->sighand->siglock); > } > - spin_unlock_irq(¤t->sighand->siglock); > + spin_unlock_irq(&tsk->sighand->siglock); > > - remove_wait_queue(¤t->sighand->signalfd_wqh, &wait); > + remove_wait_queue(&tsk->sighand->signalfd_wqh, &wait); > __set_current_state(TASK_RUNNING); > > + signalfd_put_task(ctx, tsk); > return ret; > } > > @@ -267,19 +296,24 @@ static int do_signalfd4(int ufd, sigset_t *mask, int flags) > /* Check the SFD_* constants for consistency. */ > BUILD_BUG_ON(SFD_CLOEXEC != O_CLOEXEC); > BUILD_BUG_ON(SFD_NONBLOCK != O_NONBLOCK); > + BUILD_BUG_ON(SFD_TASK & (SFD_CLOEXEC | SFD_NONBLOCK)); > > - if (flags & ~(SFD_CLOEXEC | SFD_NONBLOCK)) > + if (flags & ~(SFD_CLOEXEC | SFD_NONBLOCK | SFD_TASK)) > + return -EINVAL; > + if ((flags & (SFD_CLOEXEC | SFD_TASK)) == SFD_TASK) > return -EINVAL; (non-actionable comment: It seems kinda weird that you can specify these parameters with no effect for the `uffd != -1` case... but since the existing parameters already work that way, I guess it's consistent.)