On Fri, Jul 29, 2022 at 12:25 PM Michel Lespinasse
<michel@xxxxxxxxxxxxxx> wrote:
On Thu, Jul 28, 2022 at 10:20:53AM -0700, Paul E. McKenney wrote:
On Mon, Jul 25, 2022 at 12:43:06PM -0700, Michel Lespinasse wrote:
On Wed, Jun 08, 2022 at 04:27:27PM +0200, Peter Zijlstra wrote:
Commit c227233ad64c ("intel_idle: enable interrupts before C1 on
Xeons") wrecked intel_idle in two ways:
- must not have tracing in idle functions
- must return with IRQs disabled
Additionally, it added a branch for no good reason.
Fixes: c227233ad64c ("intel_idle: enable interrupts before C1 on Xeons")
Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
After this change was introduced, I am seeing "WARNING: suspicious RCU
usage" when booting a kernel with debug options compiled in. Please
see the attached dmesg output. The issue starts with commit 32d4fd5751ea
and is still present in v5.19-rc8.
I'm not sure, is this too late to fix or revert in v5.19 final ?
I finally got a chance to take a quick look at this.
The rcu_eqs_exit() function is making a lockdep complaint about
being invoked with interrupts enabled. This function is called from
rcu_idle_exit(), which is an expected code path from cpuidle_enter_state()
via its call to rcu_idle_exit(). Except that rcu_idle_exit() disables
interrupts before invoking rcu_eqs_exit().
The only other call to rcu_idle_exit() does not disable interrupts,
but it is via rcu_user_exit(), which would be a very odd choice for
cpuidle_enter_state().
It seems unlikely, but it might be that it is the use of local_irq_save()
instead of raw_local_irq_save() within rcu_idle_exit() that is causing
the trouble. If this is the case, then the commit shown below would
help. Note that this commit removes the warning from lockdep, so it
is necessary to build the kernel with CONFIG_RCU_EQS_DEBUG=y to enable
equivalent debugging.
Could you please try your test with the -rce commit shown below applied?
Thanks for looking into it.
After checking out Peter's commit 32d4fd5751ea,
cherry picking your commit ed4ae5eff4b3,
and setting CONFIG_RCU_EQS_DEBUG=y in addition of my usual debug config,
I am now seeing this a few seconds into the boot:
[ 3.010650] ------------[ cut here ]------------
[ 3.010651] WARNING: CPU: 0 PID: 0 at kernel/sched/clock.c:397 sched_clock_tick+0x27/0x60
[ 3.010657] Modules linked in:
[ 3.010660] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.19.0-rc1-test-00005-g1be22fea0611 #1
[ 3.010662] Hardware name: LENOVO 30BFS44D00/1036, BIOS S03KT51A 01/17/2022
[ 3.010663] RIP: 0010:sched_clock_tick+0x27/0x60
[ 3.010665] Code: 1f 40 00 53 eb 02 5b c3 66 90 8b 05 2f c3 40 01 85 c0 74 18 65 8b 05 60 88 8f 4e 85 c0 75 0d 65 8b 05 a9 85 8f 4e 85 c0 74 02 <0f> 0b e8 e2 6c 89 00 48 c7 c3 40 d5 02 00
89 c0 48 03 1c c5 c0 98
[ 3.010667] RSP: 0000:ffffffffb2803e28 EFLAGS: 00010002
[ 3.010670] RAX: 0000000000000001 RBX: ffffc8ce7fa07060 RCX: 0000000000000001
[ 3.010671] RDX: 0000000000000000 RSI: ffffffffb268dd21 RDI: ffffffffb269ab13
[ 3.010673] RBP: 0000000000000001 R08: ffffffffffc300d5 R09: 000000000002be80
[ 3.010674] R10: 000003625b53183a R11: ffffa012b802b7a4 R12: ffffffffb2aa9e80
[ 3.010675] R13: ffffffffb2aa9e00 R14: 0000000000000001 R15: 0000000000000000
[ 3.010677] FS: 0000000000000000(0000) GS:ffffa012b8000000(0000) knlGS:0000000000000000
[ 3.010678] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3.010680] CR2: ffffa012f81ff000 CR3: 0000000c99612001 CR4: 00000000003706f0
[ 3.010681] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 3.010682] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 3.010683] Call Trace:
[ 3.010685] <TASK>
[ 3.010688] cpuidle_enter_state+0xb7/0x4b0
[ 3.010694] cpuidle_enter+0x29/0x40
[ 3.010697] do_idle+0x1d4/0x210
[ 3.010702] cpu_startup_entry+0x19/0x20
[ 3.010704] rest_init+0x117/0x1a0
[ 3.010708] arch_call_rest_init+0xa/0x10
[ 3.010711] start_kernel+0x6d8/0x6ff
[ 3.010716] secondary_startup_64_no_verify+0xce/0xdb
[ 3.010728] </TASK>
[ 3.010729] irq event stamp: 44179
[ 3.010730] hardirqs last enabled at (44179): [<ffffffffb2000ccb>] asm_sysvec_apic_timer_interrupt+0x1b/0x20
[ 3.010734] hardirqs last disabled at (44177): [<ffffffffb22003f0>] __do_softirq+0x3f0/0x498
[ 3.010736] softirqs last enabled at (44178): [<ffffffffb2200332>] __do_softirq+0x332/0x498
[ 3.010738] softirqs last disabled at (44171): [<ffffffffb16c760b>] irq_exit_rcu+0xab/0xf0
[ 3.010741] ---[ end trace 0000000000000000 ]---
Can you please give this patch a go:
https://patchwork.kernel.org/project/linux-pm/patch/Yt/AxPFi88neW7W5@xxxxxxxxxxxxxxxxxxxxxxxxxx/
?