On 22 Sep 2022 13:40:47 +0800 Jing-Ting Wu <jing-ting.wu@xxxxxxxxxxxx> wrote > > Because Peter have some concern for patch by Waiman. > We add Hillf's patch to our stability test. > But there are side effects after patched. > The warning appear once < two weeks. Thanks for your test. Any other effects observed? > > Backtrace as follows: > [name:panic&]WARNING: CPU: 6 PID: 32583 at affine_move_task > pc : affine_move_task > lr : __set_cpus_allowed_ptr_locked > Call trace: > affine_move_task > __set_cpus_allowed_ptr_locked > migrate_enable > __cgroup_bpf_run_filter_skb > ip_finish_output > ip_output > > > The root cause is when is_migration_disabled(p) is true,the patched > version will set p->migration_pending to NULL by migration_cpu_stop. > And in affine_move_task will raise a WARN_ON_ONCE(!pending). > > Kernel-5.15/kernel/sched/core.c: > static int affine_move_task(struct rq *rq, struct task_struct *p, > struct rq_flags *rf, int dest_cpu, unsigned int flags) { > ... > If (WARN_ON_ONCE(!pending)) { > Task_rq_unlock(rq,p,fr); > return -EINVAL; > } > ... > } > > But the tasks have not been migrated to the new affinity CPU, so there > should be pending tasks to be processed, so p->migration_pending should > not be NULL. > > > > Without patch: > When is_migration_disabled is true, then goto out and not set p- > >migration_pending to NULL. > > static int migration_cpu_stop(void *data) { > ... > If (task_rq(p) == rq) { > if (is_migration_disabled(p)) > goto out; > } > ... > } > > > With patch: > When is_migration_disabled is true and pending is true, goto else if > flow. Because p->cpus_ptr not updated when migrate_disable, so this > condition is always true and p->migration_pending will set to NULL. > > static int migration_cpu_stop(void *data) { > ... > If (task_rq(p) == rq && !is_migration_disabled(p) ) { > ... > } else if (pending) { > ... /* * The task moved before the stopper got to run. We're holding * ->pi_lock, so the allowed mask is stable - if it got * somewhere allowed, we're done. */ > If (cpumask_test_cpu(task_cpu(p), p-> cpus_ ptr)) { > p->migration_pending = NULL; > complete = true; > goto out; > } > ... > } Given p->migration_pending reset in case of job done, the warning you saw is benign without other negative effects observed. It should be fixed (example by simply cutting it) if you have a reproducer on top of the mainline tree. Hillf