On Sun, Sep 10, 2023 at 07:37:13PM -0400, Joel Fernandes wrote: > On Sun, Sep 10, 2023 at 5:16 PM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote: > > > > On Sun, Sep 10, 2023 at 08:14:45PM +0000, Joel Fernandes wrote: > [...] > > > > I have been running into another intermittent one as well which > > > > is the boost failure and that happens once in 10-15 runs or so. > > > > > > > > I was thinking of running the following configuration on an automated > > > > regular basis to at least provide a better clue on the lucky run that > > > > catches an issue. But then the issue is it would change timing enough > > > > to maybe hide bugs. I could also make it submit logs automatically to > > > > the list on such occurrences, but one step at a time and all that. I > > > > do need to add (hopefully less noisy) tick/timer related trace events. > > > > > > > > # Define the bootargs array > > > > bootargs=( > > > > "ftrace_dump_on_oops" > > > > "panic_on_warn=1" > > > > "sysctl.kernel.panic_on_rcu_stall=1" > > > > "sysctl.kernel.max_rcu_stall_to_panic=1" > > > > "trace_buf_size=10K" > > > > "traceoff_on_warning=1" > > > > "panic_print=0x1f" # To dump held locks, mem and other info. > > > > ) > > > > # Define the trace events array passed to bootargs. > > > > trace_events=( > > > > "sched:sched_switch" > > > > "sched:sched_waking" > > > > "rcu:rcu_callback" > > > > "rcu:rcu_fqs" > > > > "rcu:rcu_quiescent_state_report" > > > > "rcu:rcu_grace_period" > > > > ) > > > > > > So some insight on this boost failure. Just before the boost failures are > > > reported, I see the migration thread interferring with the rcu_preempt thread > > > (aka GP kthread). See trace below. Of note is that the rcu_preempt thread is > > > runnable while context switching, which means its execution is interferred. > > > The rcu_preempt thread is at RT prio 2 as can be seen. > > > > > > So some open-ended questions: what exactly does the migration thread want, > > > this is something related to CPU hotplug? And if the migration thread had to > > > run, why did the rcu_preempt thread not get pushed to another CPU by the > > > scheduler? We have 16 vCPUs for this test. > > > > Maybe we need a cpus_read_lock() before doing a given boost-test interval > > and a cpus_read_unlock() after finishing one? But much depends on > > exactly what is starting those migration threads. > > But in the field, a real RT task can preempt a reader without doing > cpus_read_lock() and may run into a similar boost issue? > > > Then again, TREE03 is pretty aggressive about doing CPU hotplug. > > Ok. I put a trace_printk() in the stopper thread to see what the > ->fn() is. I'm doing another run to see what falls out. I think I am getting somewhere, It looks like the migration thread tries to push the task out due to __balance_push_cpu_stop. Clearly it fails because the rcu_preempt task is still on CPU 3. And the whole thing is live locked retrying the same. I am not sure if the migration thread is trying to migrate rcu_preempt because the CPU is going down or because a higher priority RT task is running. rcu_pree-13 3d..4. 2936723336us : sched_waking: comm=migration/3 pid=34 prio=0 target_cpu=003 rcu_pree-13 3d..2. 2936723338us : sched_switch: prev_comm=rcu_preempt prev_pid=13 prev_prio=97 prev_state=R ==> next_comm=migration/3 next_pid=34 next_prio=0 migratio-34 3..... 2936723341us : cpu_stopper_thread: cpu_stop: caller=balance_push+0xc9/0x150 fn=__balance_push_cpu_stop+0x0/0x1c0 on cpu 3 migratio-34 3d..2. 2936723345us : sched_switch: prev_comm=migration/3 prev_pid=34 prev_prio=0 prev_state=S ==> next_comm=swapper/3 next_pid=0 next_prio=120 <idle>-0 3d..2. 2936723676us : sched_switch: prev_comm=swapper/3 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=rcu_preempt next_pid=13 next_prio=97 rcu_pree-13 3d..4. 2936723678us : sched_waking: comm=migration/3 pid=34 prio=0 target_cpu=003 rcu_pree-13 3d..2. 2936723680us : sched_switch: prev_comm=rcu_preempt prev_pid=13 prev_prio=97 prev_state=R ==> next_comm=migration/3 next_pid=34 next_prio=0 migratio-34 3..... 2936723683us : cpu_stopper_thread: cpu_stop: caller=balance_push+0xc9/0x150 fn=__balance_push_cpu_stop+0x0/0x1c0 on cpu 3 I am digging deeper to see why the rcu_preempt thread cannot be pushed out and then I'll also look at why is it being pushed out in the first place. At least I have a strong repro now running 5 instances of TREE03 in parallel for several hours. thanks, - Joel