Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> writes: > On 2020/06/30 21:29, Eric W. Biederman wrote: >> Hmm. The wake up happens just of tgid->wait_pidfd happens just before >> release_task is called so there is a race. As it is possible to wake >> up and then go back to sleep before pid_has_task becomes false. > > What is the reason we want to wait until pid_has_task() becomes false? > > - wait_event(tgid->wait_pidfd, !pid_has_task(tgid, PIDTYPE_TGID)); > + while (!wait_event_timeout(tgid->wait_pidfd, !pid_has_task(tgid, PIDTYPE_TGID), 1)); So that it is safe to call bpfilter_umh_cleanup. The previous code performed the wait by having a callback in do_exit. It might be possible to call bpf_umh_cleanup early but I have not done that analysis. To perform the test correctly what I have right now is: bool thread_group_exited(struct pid *pid) { struct task_struct *tsk; bool exited; rcu_read_lock(); tsk = pid_task(pid, PIDTYPE_PID); exited = !tsk || (READ_ONCE(tsk->exit_state) && thread_group_empty(tsk)); rcu_read_unlock(); return exited; } Which is factored out of pidfd_poll. Which means that this won't be something that the bpfilter code has to maintain. That seems to be a fundamentally good facility to have regardless of bpfilter. I will post the whole thing in a bit once I have a chance to dot my i's and cross my t's. > By the way, commit 4a9d4b024a3102fc ("switch fput to task_work_add") says > that use of flush_delayed_fput() has to be careful. Al, is it safe to call > flush_delayed_fput() from blob_to_mnt() from umd_load_blob() (which might be > called from both kernel thread and from process context (e.g. init_module() > syscall by /sbin/insmod )) ? And __fput_sync needs to be even more careful. umd_load_blob is called in these changes without any locks held. We fundamentally AKA in any correct version of this code need to flush the file descriptor before we call exec or exec can not open it a read-only denying all writes from any other opens. The use case of flush_delayed_fput is exactly the same as that used when loading the initramfs. Eric