Hi, Did some quick testing with bpftrace, and it looks like it is "update_tg_cfs_load" that does this; kretfunc:attach_entity_load_avg: cfs_rq: 0xffff8a3e6773e400, se.load_sum: 0, se.load_avg: 0, se->load.weight: 1048576 diff.sum: 0, diff.load: 0, new_sum: 0, new_load: 0, period_contrib: 821 // quite some time passes kretfunc:update_tg_cfs_load: cfs_rq: 0xffff8a3e6773e400, old load_sum 0, old load_avg: 0, new load_sum: 47876096, new load_avg: 1022 tg_load_avg_contrib: 0, period_contrib: 82, on_list: 0, throttled: 0/0 stack: bpf_prog_a37cf01922e02958_update_tg_cfs_l+504 bpf_get_stackid_raw_tp+115 bpf_prog_a37cf01922e02958_update_tg_cfs_l+504 bpf_trampoline_6442466486_1+144 update_tg_cfs_load+5 update_load_avg+450 enqueue_entity+91 enqueue_task_fair+134 move_queued_task+172 affine_move_task+1282 __set_cpus_allowed_ptr+317 update_tasks_cpumask+70 update_cpumasks_hier+448 update_cpumask+345 cpuset_write_resmask+796 cgroup_file_write+162 kernfs_fop_write_iter+284 new_sync_write+345 vfs_write+511 ksys_write+103 do_syscall_64+51 entry_SYSCALL_64_after_hwframe+68 // There might be more stuff happening here // __update_load_avg_cfs_rq runs way too often to be able to // trace it without filtering, and didn't look into that. kretfunc:update_tg_cfs_load: cfs_rq: 0xffff8a3e6773e400, old load_sum 48117265, old load_avg: 1025, new load_sum: 0, new load_avg: 2 tg_load_avg_contrib: 1022, period_contrib: 223, on_list: 1, throttled: 1/1 stack: bpf_prog_a37cf01922e02958_update_tg_cfs_l+504 bpf_get_stackid_raw_tp+115 bpf_prog_a37cf01922e02958_update_tg_cfs_l+504 bpf_trampoline_6442466486_1+144 update_tg_cfs_load+5 update_load_avg+450 update_blocked_averages+723 newidle_balance+533 pick_next_task_fair+57 __schedule+376 schedule+91 schedule_hrtimeout_range_clock+164 do_sys_poll+1043 __x64_sys_poll+179 do_syscall_64+51 entry_SYSCALL_64_after_hwframe+68 This particular example resulted in a 75/25 share of cpu time for the two groups. >From this, it also looks like it doesn't matter if new load_avg is 0 or not (when new load_sum is 0), since tg_load_avg_contrib will not be properly removed either way (without this patch). There might be some other precision > I have added a trace in cfs_rq_is_decayed() but I'm not able to > reproduce a situation where load_sum == 0 but not load_avg even with > the script in the cover letter Hmm, strange. I am able to reproduce on both v5.12 and v5.13. I am running on a non SMT 4 core CPU (4670k), with CONFIG_HZ=250. Since this is timing sensitive, you might need to tweak the timings for either the CFS bw period/quota, or the time waiting between changing what cpu it runs on. If you have more than 4 cpus, you can also try to start on CPU 0 and move it through all cores and then onto CPU n. Maybe that makes it possible to reproduce. Since bpftrace doesn't work properly in v5.13, this particular test was on v1.12 with cherry-pick of 0258bdfaff5bd13c4d2383150b7097aecd6b6d82 and the other patch in this patchset. testscript.bt: kfunc:attach_entity_load_avg {@a_sum[tid] = args->cfs_rq->avg.load_sum; @a_load[tid] = args->cfs_rq->avg.load_avg;} kretfunc:attach_entity_load_avg{printf("%s: cfs_rq: %p, se.load_sum: %llu, se.load_avg: %llu, se->load.weight: %llu diff.sum: %llu, diff.load: %llu, new_sum: %llu, new_load: %llu, period_contrib: %llu\n", probe, args->cfs_rq,args->se->avg.load_sum, args->se->avg.load_avg, args->se->load.weight,args->cfs_rq->avg.load_sum - @a_sum[tid], args->cfs_rq->avg.load_avg - @a_load[tid], args->cfs_rq->avg.load_sum, args->cfs_rq->avg.load_avg, args->cfs_rq->avg.period_contrib)} kfunc:update_tg_cfs_load {@beg_load_avg[tid] = args->cfs_rq->avg.load_avg; @beg_load_sum[tid] = args->cfs_rq->avg.load_sum} kretfunc:update_tg_cfs_load {printf("%s: cfs_rq: %p, old load_sum %llu, old load_avg: %llu, new load_sum: %llu, new load_avg: %llu tg_load_avg_contrib: %llu, period_contrib: %llu, on_list: %d, throttled: %d/%d stack: %s\n",probe, args->cfs_rq, @beg_load_sum[tid], @beg_load_avg[tid], args->cfs_rq->avg.load_sum, args->cfs_rq->avg.load_avg, args->cfs_rq->tg_load_avg_contrib, args->cfs_rq->avg.period_contrib,args->cfs_rq->on_list, args->cfs_rq->throttled, args->cfs_rq->throttle_count,kstack)} Thanks Odin