On Fri, Sep 27, 2024 at 02:04:13PM GMT, Alice Ryhl wrote: > On Thu, Sep 26, 2024 at 6:36 PM Christian Brauner <brauner@xxxxxxxxxx> wrote: > > > > Ok, so here's my feeble attempt at getting something going for wrapping > > struct pid_namespace as struct pid_namespace indirectly came up in the > > file abstraction thread. > > This looks great! Thanks! > > > The lifetime of a pid namespace is intimately tied to the lifetime of > > task. The pid namespace of a task doesn't ever change. A > > unshare(CLONE_NEWPID) or setns(fd_pidns/pidfd, CLONE_NEWPID) will not > > change the task's pid namespace only the pid namespace of children > > spawned by the task. This invariant is important to keep in mind. > > > > After a task is reaped it will be detached from its associated struct > > pids via __unhash_process(). This will also set task->thread_pid to > > NULL. > > > > In order to retrieve the pid namespace of a task task_active_pid_ns() > > can be used. The helper works on both current and non-current taks but > > the requirements are slightly different in both cases and it depends on > > where the helper is called. > > > > The rules for this are simple but difficult for me to translate into > > Rust. If task_active_pid_ns() is called on current then no RCU locking > > is needed as current is obviously alive. On the other hand calling > > task_active_pid_ns() after release_task() would work but it would mean > > task_active_pid_ns() will return NULL. > > > > Calling task_active_pid_ns() on a non-current task, while valid, must be > > under RCU or other protection mechanism as the task might be > > release_task() and thus in __unhash_process(). > > Just to confirm, calling task_active_pid_ns() on a non-current task > requires the rcu lock even if you own a refcont on the task? Interesting question. Afaik, yes. task_active_pid_ns() goes via task->thread_pid which is a shorthand for task->pid_links[PIDTYPE_PID]. This will be NULLed when the task exits and is dead (so usually when someone has waited on it - ignoring ptrace for sanity reasons and autoreaping the latter amounts to the same thing just in-kernel): T1 T2 T3 exit(0); wait(T1) -> wait_task_zombie() -> release_task() -> __exit_signals() -> __unash_process() // sets task->thread_pid == NULL task_active_pid_ns(T1) // task->pid_links[PIDTYPE_PID] == NULL So having a reference to struct task_struct doesn't prevent task->thread_pid becoming NULL. And you touch upon a very interesting point. The lifetime of struct pid_namespace is actually tied to struct pid much tighter than it is to struct task_struct. So when a task is released (transitions from zombie to dead in the common case) the following happens: release_task() -> __exit_signals() -> thread_pid = get_pid(task->thread_pid) -> __unhash_process() -> detach_pid(PIDTYPE_PID) -> __change_pid() { task->thread_pid = NULL; task->pid_links[PIDTYPE_PID] = NULL; free_pid(thread_pid) } put_pid(thread_pid) And the free_pid() in __change_pid() does a delayed_put_pid() via call_rcu(). So afaiu, taking the rcu_read_lock() synchronizes against that delayed_put_pid() in __change_pid() so the call_rcu() will wait until everyone who does rcu_read_lock() task_active_pid_ns(task) rcu_read_unlock() and sees task->thread_pid non-NULL, is done. This way no additional reference count on struct task_struct or struct pid is needed before plucking the pid namespace from there. Does that make sense or have I gotten it all wrong?