Re: [BUG almost bisected] Splat in dequeue_rt_stack() and build error

Tomas Glozar <tglozar@xxxxxxxxxx> · Thu, 10 Oct 2024 13:24:11 +0200

st 2. 10. 2024 v 11:01 odesílatel Tomas Glozar <tglozar@xxxxxxxxxx> napsal:
>
> FYI I have managed to reproduce the bug on our infrastructure after 21
> hours of 7*TREE03 and I will continue with trying to reproduce it with
> the tracers we want.
>
> Tomas

I successfully reproduced the bug also with the tracers active after a
few 8-hour test runs on our infrastructure:

[    0.000000] Linux version 6.11.0-g2004cef11ea0-dirty (...) #1 SMP
PREEMPT_DYNAMIC Wed Oct  9 12:13:40 EDT 2024
[    0.000000] Command line: debug_boot_weak_hash panic=-1 selinux=0
initcall_debug debug console=ttyS0 rcutorture.n_barrier_cbs=4
rcutorture.stat_interval=15 rcutorture.shutdown_secs=25200
rcutorture.test_no_idle_hz=1 rcutorture.verbose=1
rcutorture.onoff_interval=200 rcutorture.onoff_holdoff=30
rcutree.gp_preinit_delay=12 rcutree.gp_init_delay=3
rcutree.gp_cleanup_delay=3 rcutree.kthread_prio=2 threadirqs
rcutree.use_softirq=0
trace_event=sched:sched_switch,sched:sched_wakeup
ftrace_filter=dl_server_start,dl_server_stop trace_buf_size=2k
ftrace=function torture.ftrace_dump_at_shutdown=1
...
[13550.127541] WARNING: CPU: 1 PID: 155 at
kernel/sched/deadline.c:1971 enqueue_dl_entity+0x554/0x5d0
[13550.128982] Modules linked in:
[13550.129528] CPU: 1 UID: 0 PID: 155 Comm: rcu_torture_rea Tainted: G
       W          6.11.0-g2004cef11ea0-dirty #1
[13550.131419] Tainted: [W]=WARN
[13550.131979] Hardware name: Red Hat KVM/RHEL, BIOS 1.16.3-2.el9 04/01/2014
[13550.133230] RIP: 0010:enqueue_dl_entity+0x554/0x5d0
...
[13550.151286] Call Trace:
[13550.151749]  <TASK>
[13550.152141]  ? __warn+0x88/0x130
[13550.152717]  ? enqueue_dl_entity+0x554/0x5d0
[13550.153485]  ? report_bug+0x18e/0x1a0
[13550.154149]  ? handle_bug+0x54/0x90
[13550.154792]  ? exc_invalid_op+0x18/0x70
[13550.155484]  ? asm_exc_invalid_op+0x1a/0x20
[13550.156249]  ? enqueue_dl_entity+0x554/0x5d0
[13550.157055]  dl_server_start+0x36/0xf0
[13550.157709]  enqueue_task_fair+0x220/0x6b0
[13550.158447]  activate_task+0x26/0x60
[13550.159131]  attach_task+0x35/0x50
[13550.159756]  sched_balance_rq+0x663/0xe00
[13550.160511]  sched_balance_newidle.constprop.0+0x1a5/0x360
[13550.161520]  pick_next_task_fair+0x2f/0x340
[13550.162290]  __schedule+0x203/0x900
[13550.162958]  ? enqueue_hrtimer+0x35/0x90
[13550.163703]  schedule+0x27/0xd0
[13550.164299]  schedule_hrtimeout_range_clock+0x99/0x120
[13550.165239]  ? __pfx_hrtimer_wakeup+0x10/0x10
[13550.165954]  torture_hrtimeout_us+0x7b/0xe0
[13550.166624]  rcu_torture_reader+0x139/0x200
[13550.167284]  ? __pfx_rcu_torture_timer+0x10/0x10
[13550.168019]  ? __pfx_rcu_torture_reader+0x10/0x10
[13550.168764]  kthread+0xd6/0x100
[13550.169262]  ? __pfx_kthread+0x10/0x10
[13550.169860]  ret_from_fork+0x34/0x50
[13550.170424]  ? __pfx_kthread+0x10/0x10
[13550.171020]  ret_from_fork_asm+0x1a/0x30
[13550.171657]  </TASK>

Unfortunately, the following rcu stalls appear to have resulted in
abnormal termination of the VM, which led to the ftrace buffer not
being dumped into the console. Currently re-running the same test with
the addition of "ftrace_dump_on_oops panic_on_warn=1" and hoping for
the best.

Tomas