On Sun, Nov 27, 2022 at 08:44:41PM -0500, Waiman Long wrote: > Since commit 07ec77a1d4e8 ("sched: Allow task CPU affinity to be > restricted on asymmetric systems"), the setting and clearing of > user_cpus_ptr are done under pi_lock for arm64 architecture. However, > dup_user_cpus_ptr() accesses user_cpus_ptr without any lock > protection. When racing with the clearing of user_cpus_ptr in > __set_cpus_allowed_ptr_locked(), it can lead to user-after-free and > double-free in arm64 kernel. > > Commit 8f9ea86fdf99 ("sched: Always preserve the user requested > cpumask") fixes this problem as user_cpus_ptr, once set, will never > be cleared in a task's lifetime. However, this bug was re-introduced > in commit 851a723e45d1 ("sched: Always clear user_cpus_ptr in > do_set_cpus_allowed()") which allows the clearing of user_cpus_ptr in > do_set_cpus_allowed(). This time, it will affect all arches. > > Fix this bug by always clearing the user_cpus_ptr of the newly > cloned/forked task before the copying process starts and check the > user_cpus_ptr state of the source task under pi_lock. > > Note to stable, this patch won't be applicable to stable releases. > Just copy the new dup_user_cpus_ptr() function over. > > Fixes: 07ec77a1d4e8 ("sched: Allow task CPU affinity to be restricted on asymmetric systems") > Fixes: 851a723e45d1 ("sched: Always clear user_cpus_ptr in do_set_cpus_allowed()") > CC: stable@xxxxxxxxxxxxxxx > Reported-by: David Wang 王标 <wangbiao3@xxxxxxxxxx> > Signed-off-by: Waiman Long <longman@xxxxxxxxxx> > --- > kernel/sched/core.c | 32 ++++++++++++++++++++++++++++---- > 1 file changed, 28 insertions(+), 4 deletions(-) As per my comments on the previous version of this patch: https://lore.kernel.org/lkml/20221201133602.GB28489@willie-the-truck/T/#t I think there are other issues to fix when racing affinity changes with fork() too. > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index 8df51b08bb38..f2b75faaf71a 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -2624,19 +2624,43 @@ void do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new_mask) > int dup_user_cpus_ptr(struct task_struct *dst, struct task_struct *src, > int node) > { > + cpumask_t *user_mask; > unsigned long flags; > > + /* > + * Always clear dst->user_cpus_ptr first as their user_cpus_ptr's > + * may differ by now due to racing. > + */ > + dst->user_cpus_ptr = NULL; > + > + /* > + * This check is racy and losing the race is a valid situation. > + * It is not worth the extra overhead of taking the pi_lock on > + * every fork/clone. > + */ > if (!src->user_cpus_ptr) > return 0; data_race() ? > > - dst->user_cpus_ptr = kmalloc_node(cpumask_size(), GFP_KERNEL, node); > - if (!dst->user_cpus_ptr) > + user_mask = kmalloc_node(cpumask_size(), GFP_KERNEL, node); > + if (!user_mask) > return -ENOMEM; > > - /* Use pi_lock to protect content of user_cpus_ptr */ > + /* > + * Use pi_lock to protect content of user_cpus_ptr > + * > + * Though unlikely, user_cpus_ptr can be reset to NULL by a concurrent > + * do_set_cpus_allowed(). > + */ > raw_spin_lock_irqsave(&src->pi_lock, flags); > - cpumask_copy(dst->user_cpus_ptr, src->user_cpus_ptr); > + if (src->user_cpus_ptr) { > + swap(dst->user_cpus_ptr, user_mask); Isn't 'dst->user_cpus_ptr' always NULL here? Why do we need the swap() instead of just assigning the thing directly? Will