On Fri, Mar 05, 2021 at 12:21:40PM +0000, Chris Wilson wrote: > As previously noted in commit 66e4f4a9cc38 ("rtc: cmos: Use > spin_lock_irqsave() in cmos_interrupt()"): > > <4>[ 254.192378] WARNING: inconsistent lock state > <4>[ 254.192384] 5.12.0-rc1-CI-CI_DRM_9834+ #1 Not tainted > <4>[ 254.192396] -------------------------------- > <4>[ 254.192400] inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage. > <4>[ 254.192409] rtcwake/5309 [HC0[0]:SC0[0]:HE1:SE1] takes: > <4>[ 254.192429] ffffffff8263c5f8 (rtc_lock){?...}-{2:2}, at: cmos_interrupt+0x18/0x100 > <4>[ 254.192481] {IN-HARDIRQ-W} state was registered at: > <4>[ 254.192488] lock_acquire+0xd1/0x3d0 > <4>[ 254.192504] _raw_spin_lock+0x2a/0x40 > <4>[ 254.192519] cmos_interrupt+0x18/0x100 > <4>[ 254.192536] rtc_handler+0x1f/0xc0 > <4>[ 254.192553] acpi_ev_fixed_event_detect+0x109/0x13c > <4>[ 254.192574] acpi_ev_sci_xrupt_handler+0xb/0x28 > <4>[ 254.192596] acpi_irq+0x13/0x30 > <4>[ 254.192620] __handle_irq_event_percpu+0x43/0x2c0 > <4>[ 254.192641] handle_irq_event_percpu+0x2b/0x70 > <4>[ 254.192661] handle_irq_event+0x2f/0x50 > <4>[ 254.192680] handle_fasteoi_irq+0x9e/0x150 > <4>[ 254.192693] __common_interrupt+0x76/0x140 > <4>[ 254.192715] common_interrupt+0x96/0xc0 > <4>[ 254.192732] asm_common_interrupt+0x1e/0x40 > <4>[ 254.192750] _raw_spin_unlock_irqrestore+0x38/0x60 > <4>[ 254.192767] resume_irqs+0xba/0xf0 > <4>[ 254.192786] dpm_resume_noirq+0x245/0x3d0 > <4>[ 254.192811] suspend_devices_and_enter+0x230/0xaa0 > <4>[ 254.192835] pm_suspend.cold.8+0x301/0x34a > <4>[ 254.192859] state_store+0x7b/0xe0 > <4>[ 254.192879] kernfs_fop_write_iter+0x11d/0x1c0 > <4>[ 254.192899] new_sync_write+0x11d/0x1b0 > <4>[ 254.192916] vfs_write+0x265/0x390 > <4>[ 254.192933] ksys_write+0x5a/0xd0 > <4>[ 254.192949] do_syscall_64+0x33/0x80 > <4>[ 254.192965] entry_SYSCALL_64_after_hwframe+0x44/0xae > <4>[ 254.192986] irq event stamp: 43775 > <4>[ 254.192994] hardirqs last enabled at (43775): [<ffffffff81c00c42>] asm_sysvec_apic_timer_interrupt+0x12/0x20 > <4>[ 254.193023] hardirqs last disabled at (43774): [<ffffffff81aa691a>] sysvec_apic_timer_interrupt+0xa/0xb0 > <4>[ 254.193049] softirqs last enabled at (42548): [<ffffffff81e00342>] __do_softirq+0x342/0x48e > <4>[ 254.193074] softirqs last disabled at (42543): [<ffffffff810b45fd>] irq_exit_rcu+0xad/0xd0 > <4>[ 254.193101] > other info that might help us debug this: > <4>[ 254.193107] Possible unsafe locking scenario: > > <4>[ 254.193112] CPU0 > <4>[ 254.193117] ---- > <4>[ 254.193121] lock(rtc_lock); > <4>[ 254.193137] <Interrupt> > <4>[ 254.193142] lock(rtc_lock); > <4>[ 254.193156] > *** DEADLOCK *** > > <4>[ 254.193161] 6 locks held by rtcwake/5309: > <4>[ 254.193174] #0: ffff888104861430 (sb_writers#5){.+.+}-{0:0}, at: ksys_write+0x5a/0xd0 > <4>[ 254.193232] #1: ffff88810f823288 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter+0xe7/0x1c0 > <4>[ 254.193282] #2: ffff888100cef3c0 (kn->active#285 > <7>[ 254.192706] i915 0000:00:02.0: [drm:intel_modeset_setup_hw_state [i915]] [CRTC:51:pipe A] hw state readout: disabled > <4>[ 254.193307] ){.+.+}-{0:0}, at: kernfs_fop_write_iter+0xf0/0x1c0 > <4>[ 254.193333] #3: ffffffff82649fa8 (system_transition_mutex){+.+.}-{3:3}, at: pm_suspend.cold.8+0xce/0x34a > <4>[ 254.193387] #4: ffffffff827a2108 (acpi_scan_lock){+.+.}-{3:3}, at: acpi_suspend_begin+0x47/0x70 > <4>[ 254.193433] #5: ffff8881019ea178 (&dev->mutex){....}-{3:3}, at: device_resume+0x68/0x1e0 > <4>[ 254.193485] > stack backtrace: > <4>[ 254.193492] CPU: 1 PID: 5309 Comm: rtcwake Not tainted 5.12.0-rc1-CI-CI_DRM_9834+ #1 > <4>[ 254.193514] Hardware name: Google Soraka/Soraka, BIOS MrChromebox-4.10 08/25/2019 > <4>[ 254.193524] Call Trace: > <4>[ 254.193536] dump_stack+0x7f/0xad > <4>[ 254.193567] mark_lock.part.47+0x8ca/0xce0 > <4>[ 254.193604] __lock_acquire+0x39b/0x2590 > <4>[ 254.193626] ? asm_sysvec_apic_timer_interrupt+0x12/0x20 > <4>[ 254.193660] lock_acquire+0xd1/0x3d0 > <4>[ 254.193677] ? cmos_interrupt+0x18/0x100 > <4>[ 254.193716] _raw_spin_lock+0x2a/0x40 > <4>[ 254.193735] ? cmos_interrupt+0x18/0x100 > <4>[ 254.193758] cmos_interrupt+0x18/0x100 > <4>[ 254.193785] cmos_resume+0x2ac/0x2d0 > <4>[ 254.193813] ? acpi_pm_set_device_wakeup+0x1f/0x110 > <4>[ 254.193842] ? pnp_bus_suspend+0x10/0x10 > <4>[ 254.193864] pnp_bus_resume+0x5e/0x90 > <4>[ 254.193885] dpm_run_callback+0x5f/0x240 > <4>[ 254.193914] device_resume+0xb2/0x1e0 > <4>[ 254.193942] ? pm_dev_err+0x25/0x25 > <4>[ 254.193974] dpm_resume+0xea/0x3f0 > <4>[ 254.194005] dpm_resume_end+0x8/0x10 > <4>[ 254.194030] suspend_devices_and_enter+0x29b/0xaa0 > <4>[ 254.194066] pm_suspend.cold.8+0x301/0x34a > <4>[ 254.194094] state_store+0x7b/0xe0 > <4>[ 254.194124] kernfs_fop_write_iter+0x11d/0x1c0 > <4>[ 254.194151] new_sync_write+0x11d/0x1b0 > <4>[ 254.194183] vfs_write+0x265/0x390 > <4>[ 254.194207] ksys_write+0x5a/0xd0 > <4>[ 254.194232] do_syscall_64+0x33/0x80 > <4>[ 254.194251] entry_SYSCALL_64_after_hwframe+0x44/0xae > <4>[ 254.194274] RIP: 0033:0x7f07d79691e7 > <4>[ 254.194293] Code: 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24 > <4>[ 254.194312] RSP: 002b:00007ffd9cc2c768 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 > <4>[ 254.194337] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f07d79691e7 > <4>[ 254.194352] RDX: 0000000000000004 RSI: 0000556ebfc63590 RDI: 000000000000000b > <4>[ 254.194366] RBP: 0000556ebfc63590 R08: 0000000000000000 R09: 0000000000000004 > <4>[ 254.194379] R10: 0000556ebf0ec2a6 R11: 0000000000000246 R12: 0000000000000004 > > which breaks S3-resume on fi-kbl-soraka presumably as that's slow enough > to trigger the alarm during the suspend. > > Fixes: 6950d046eb6e ("rtc: cmos: Replace spin_lock_irqsave with spin_lock in hard IRQ") Sigh. I wish people would at least try to check the code/history before doing these blind "cleanups" :( > References: 66e4f4a9cc38 ("rtc: cmos: Use spin_lock_irqsave() in cmos_interrupt()"): > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > Cc: Xiaofei Tan <tanxiaofei@xxxxxxxxxx> > Cc: Alexandre Belloni <alexandre.belloni@xxxxxxxxxxx> > Cc: Alessandro Zummo <a.zummo@xxxxxxxxxxxx> > Cc: Ville Syrjälä <ville.syrjala@xxxxxxxxxxxxxxx> > --- > drivers/rtc/rtc-cmos.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/drivers/rtc/rtc-cmos.c b/drivers/rtc/rtc-cmos.c > index 670fd8a2970e..6545afb2f20e 100644 > --- a/drivers/rtc/rtc-cmos.c > +++ b/drivers/rtc/rtc-cmos.c > @@ -1053,7 +1053,9 @@ static void cmos_check_wkalrm(struct device *dev) > * ACK the rtc irq here > */ > if (t_now >= cmos->alarm_expires && cmos_use_acpi_alarm()) { > + local_irq_disable(); > cmos_interrupt(0, (void *)cmos->rtc); > + local_irq_enable(); Yeah, given what's already happened this seems more likely to survive a bit longer. Reviewed-by: Ville Syrjälä <ville.syrjala@xxxxxxxxxxxxxxx> > return; > } > > -- > 2.20.1 -- Ville Syrjälä Intel