On Wed, Feb 07, 2024 at 10:29:51AM +0100, Christian Brauner wrote: > On Wed, Feb 07, 2024 at 10:15:25AM +0100, Christian Brauner wrote: > > On Tue, Feb 06, 2024 at 12:23:57PM -0700, Tycho Andersen wrote: > > > From: Tycho Andersen <tandersen@xxxxxxxxxxx> > > > > > > We can get EBADF from __pidfd_fget() if a task is currently exiting, which > > > might be confusing. Let's check PF_EXITING, and just report ESRCH if so. > > > > > > I chose PF_EXITING, because it is set in exit_signals(), which is called > > > before exit_files(). Since ->exit_status is mostly set after exit_files() > > > in exit_notify(), using that still leaves a window open for the race. > > > > > > Signed-off-by: Tycho Andersen <tandersen@xxxxxxxxxxx> > > > v2: fix a race in the check by putting the check after __pidfd_fget() > > > (thanks Oleg) > > > --- > > > kernel/pid.c | 17 +++++++++- > > > .../selftests/pidfd/pidfd_getfd_test.c | 31 ++++++++++++++++++- > > > 2 files changed, 46 insertions(+), 2 deletions(-) > > > > > > diff --git a/kernel/pid.c b/kernel/pid.c > > > index de0bf2f8d18b..a8cd6296ed6d 100644 > > > --- a/kernel/pid.c > > > +++ b/kernel/pid.c > > > @@ -693,8 +693,23 @@ static int pidfd_getfd(struct pid *pid, int fd) > > > > > > file = __pidfd_fget(task, fd); > > > put_task_struct(task); > > > - if (IS_ERR(file)) > > > + if (IS_ERR(file)) { > > > + /* > > > + * It is possible that the target thread is exiting; it can be > > > + * either: > > > + * 1. before exit_signals(), which gives a real fd > > > + * 2. before exit_files() takes the task_lock() gives a real fd > > > + * 3. after exit_files() releases task_lock(), ->files is NULL; > > > + * this has PF_EXITING, since it was set in exit_signals(), > > > + * __pidfd_fget() returns EBADF. > > > + * In case 3 we get EBADF, but that really means ESRCH, since > > > + * the task is currently exiting and has freed its files > > > + * struct, so we fix it up. > > > + */ > > > + if (task->flags & PF_EXITING && PTR_ERR(file) == -EBADF) > > > + return -ESRCH; > > > > Isn't that a potential UAF because we called put_task_struct() above but > > this is exiting task->flags afterwards? > > s/exiting/accessing/ So this is what I have applied currently where I moved the check into __pidfd_fget() where it makes more sense imho. But please double check that I didn't mess anything up: >From 7ab8f833aceb11c78627f4ea5d7e354314efa385 Mon Sep 17 00:00:00 2001 From: Tycho Andersen <tandersen@xxxxxxxxxxx> Date: Wed, 7 Feb 2024 10:19:29 +0100 Subject: [PATCH 1/2] pidfd: getfd should always report ESRCH if a task is exiting We can get EBADF from pidfd_getfd() if a task is currently exiting, which might be confusing. Let's check PF_EXITING, and just report ESRCH if so. I chose PF_EXITING, because it is set in exit_signals(), which is called before exit_files(). Since ->exit_status is mostly set after exit_files() in exit_notify(), using that still leaves a window open for the race. Signed-off-by: Tycho Andersen <tandersen@xxxxxxxxxxx> Link: https://lore.kernel.org/r/20240206192357.81942-1-tycho@tycho.pizza Signed-off-by: Christian Brauner <brauner@xxxxxxxxxx> --- kernel/pid.c | 21 ++++++++++++++++++++- 1 file changed, 20 insertions(+), 1 deletion(-) diff --git a/kernel/pid.c b/kernel/pid.c index de0bf2f8d18b..c1d940fbd314 100644 --- a/kernel/pid.c +++ b/kernel/pid.c @@ -678,7 +678,26 @@ static struct file *__pidfd_fget(struct task_struct *task, int fd) up_read(&task->signal->exec_update_lock); - return file ?: ERR_PTR(-EBADF); + if (!file) { + /* + * It is possible that the target thread is exiting; it can be + * either: + * 1. before exit_signals(), which gives a real fd + * 2. before exit_files() takes the task_lock() gives a real fd + * 3. after exit_files() releases task_lock(), ->files is NULL; + * this has PF_EXITING, since it was set in exit_signals(), + * __pidfd_fget() returns EBADF. + * In case 3 we get EBADF, but that really means ESRCH, since + * the task is currently exiting and has freed its files + * struct, so we fix it up. + */ + if (task->flags & PF_EXITING) + file = ERR_PTR(-ESRCH); + else + file = ERR_PTR(-EBADF); + } + + return file; } static int pidfd_getfd(struct pid *pid, int fd) -- 2.43.0 >From 43316ed070cd8fb02a51ea9577c5fc1fcf639652 Mon Sep 17 00:00:00 2001 From: Tycho Andersen <tandersen@xxxxxxxxxxx> Date: Wed, 7 Feb 2024 10:19:44 +0100 Subject: [PATCH 2/2] selftests: add ESRCH tests for pidfd_getfd() Ensure that pidfd_getfd() reports -ESRCH if the task is already exiting. Signed-off-by: Tycho Andersen <tandersen@xxxxxxxxxxx> Link: https://lore.kernel.org/r/20240206192357.81942-1-tycho@tycho.pizza Signed-off-by: Christian Brauner <brauner@xxxxxxxxxx> --- .../selftests/pidfd/pidfd_getfd_test.c | 31 ++++++++++++++++++- 1 file changed, 30 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/pidfd/pidfd_getfd_test.c b/tools/testing/selftests/pidfd/pidfd_getfd_test.c index 0930e2411dfb..cd51d547b751 100644 --- a/tools/testing/selftests/pidfd/pidfd_getfd_test.c +++ b/tools/testing/selftests/pidfd/pidfd_getfd_test.c @@ -5,6 +5,7 @@ #include <fcntl.h> #include <limits.h> #include <linux/types.h> +#include <poll.h> #include <sched.h> #include <signal.h> #include <stdio.h> @@ -129,6 +130,7 @@ FIXTURE(child) * When it is closed, the child will exit. */ int sk; + bool ignore_child_result; }; FIXTURE_SETUP(child) @@ -165,10 +167,14 @@ FIXTURE_SETUP(child) FIXTURE_TEARDOWN(child) { + int ret; + EXPECT_EQ(0, close(self->pidfd)); EXPECT_EQ(0, close(self->sk)); - EXPECT_EQ(0, wait_for_pid(self->pid)); + ret = wait_for_pid(self->pid); + if (!self->ignore_child_result) + EXPECT_EQ(0, ret); } TEST_F(child, disable_ptrace) @@ -235,6 +241,29 @@ TEST(flags_set) EXPECT_EQ(errno, EINVAL); } +TEST_F(child, no_strange_EBADF) +{ + struct pollfd fds; + + self->ignore_child_result = true; + + fds.fd = self->pidfd; + fds.events = POLLIN; + + ASSERT_EQ(kill(self->pid, SIGKILL), 0); + ASSERT_EQ(poll(&fds, 1, 5000), 1); + + /* + * It used to be that pidfd_getfd() could race with the exiting thread + * between exit_files() and release_task(), and get a non-null task + * with a NULL files struct, and you'd get EBADF, which was slightly + * confusing. + */ + errno = 0; + EXPECT_EQ(sys_pidfd_getfd(self->pidfd, self->remote_fd, 0), -1); + EXPECT_EQ(errno, ESRCH); +} + #if __NR_pidfd_getfd == -1 int main(void) { -- 2.43.0