Re: [PATCH-tip] sched: Fix use-after-free bug in dup_user_cpus_ptr()

Will Deacon <will@xxxxxxxxxx> · Thu, 1 Dec 2022 13:44:46 +0000

On Sun, Nov 27, 2022 at 08:44:41PM -0500, Waiman Long wrote:
> Since commit 07ec77a1d4e8 ("sched: Allow task CPU affinity to be
> restricted on asymmetric systems"), the setting and clearing of
> user_cpus_ptr are done under pi_lock for arm64 architecture. However,
> dup_user_cpus_ptr() accesses user_cpus_ptr without any lock
> protection. When racing with the clearing of user_cpus_ptr in
> __set_cpus_allowed_ptr_locked(), it can lead to user-after-free and
> double-free in arm64 kernel.
> 
> Commit 8f9ea86fdf99 ("sched: Always preserve the user requested
> cpumask") fixes this problem as user_cpus_ptr, once set, will never
> be cleared in a task's lifetime. However, this bug was re-introduced
> in commit 851a723e45d1 ("sched: Always clear user_cpus_ptr in
> do_set_cpus_allowed()") which allows the clearing of user_cpus_ptr in
> do_set_cpus_allowed(). This time, it will affect all arches.
> 
> Fix this bug by always clearing the user_cpus_ptr of the newly
> cloned/forked task before the copying process starts and check the
> user_cpus_ptr state of the source task under pi_lock.
> 
> Note to stable, this patch won't be applicable to stable releases.
> Just copy the new dup_user_cpus_ptr() function over.
> 
> Fixes: 07ec77a1d4e8 ("sched: Allow task CPU affinity to be restricted on asymmetric systems")
> Fixes: 851a723e45d1 ("sched: Always clear user_cpus_ptr in do_set_cpus_allowed()")
> CC: stable@xxxxxxxxxxxxxxx
> Reported-by: David Wang 王标 <wangbiao3@xxxxxxxxxx>
> Signed-off-by: Waiman Long <longman@xxxxxxxxxx>
> ---
>  kernel/sched/core.c | 32 ++++++++++++++++++++++++++++----
>  1 file changed, 28 insertions(+), 4 deletions(-)

As per my comments on the previous version of this patch:

https://lore.kernel.org/lkml/20221201133602.GB28489@willie-the-truck/T/#t

I think there are other issues to fix when racing affinity changes with
fork() too.

> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 8df51b08bb38..f2b75faaf71a 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -2624,19 +2624,43 @@ void do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new_mask)
>  int dup_user_cpus_ptr(struct task_struct *dst, struct task_struct *src,
>  		      int node)
>  {
> +	cpumask_t *user_mask;
>  	unsigned long flags;
>  
> +	/*
> +	 * Always clear dst->user_cpus_ptr first as their user_cpus_ptr's
> +	 * may differ by now due to racing.
> +	 */
> +	dst->user_cpus_ptr = NULL;
> +
> +	/*
> +	 * This check is racy and losing the race is a valid situation.
> +	 * It is not worth the extra overhead of taking the pi_lock on
> +	 * every fork/clone.
> +	 */
>  	if (!src->user_cpus_ptr)
>  		return 0;

data_race() ?

>  
> -	dst->user_cpus_ptr = kmalloc_node(cpumask_size(), GFP_KERNEL, node);
> -	if (!dst->user_cpus_ptr)
> +	user_mask = kmalloc_node(cpumask_size(), GFP_KERNEL, node);
> +	if (!user_mask)
>  		return -ENOMEM;
>  
> -	/* Use pi_lock to protect content of user_cpus_ptr */
> +	/*
> +	 * Use pi_lock to protect content of user_cpus_ptr
> +	 *
> +	 * Though unlikely, user_cpus_ptr can be reset to NULL by a concurrent
> +	 * do_set_cpus_allowed().
> +	 */
>  	raw_spin_lock_irqsave(&src->pi_lock, flags);
> -	cpumask_copy(dst->user_cpus_ptr, src->user_cpus_ptr);
> +	if (src->user_cpus_ptr) {
> +		swap(dst->user_cpus_ptr, user_mask);

Isn't 'dst->user_cpus_ptr' always NULL here? Why do we need the swap()
instead of just assigning the thing directly?

Will