Patch "clocksource/drivers/sh_cmt: Address race condition for clock events" has been added to the 6.6-stable tree

Sasha Levin <sashal@xxxxxxxxxx> · Sun, 11 Aug 2024 09:02:08 -0400

This is a note to let you know that I've just added the patch titled

    clocksource/drivers/sh_cmt: Address race condition for clock events

to the 6.6-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     clocksource-drivers-sh_cmt-address-race-condition-fo.patch
and it can be found in the queue-6.6 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@xxxxxxxxxxxxxxx> know about it.



commit cf9b49f5c049c5ff4f5809b8410ee361a3d0f1c7
Author: Niklas Söderlund <niklas.soderlund+renesas@xxxxxxxxxxxx>
Date:   Tue Jul 2 21:02:30 2024 +0200

    clocksource/drivers/sh_cmt: Address race condition for clock events
    
    [ Upstream commit db19d3aa77612983a02bd223b3f273f896b243cf ]
    
    There is a race condition in the CMT interrupt handler. In the interrupt
    handler the driver sets a driver private flag, FLAG_IRQCONTEXT. This
    flag is used to indicate any call to set_next_event() should not be
    directly propagated to the device, but instead cached. This is done as
    the interrupt handler itself reprograms the device when needed before it
    completes and this avoids this operation to take place twice.
    
    It is unclear why this design was chosen, my suspicion is to allow the
    struct clock_event_device.event_handler callback, which is called while
    the FLAG_IRQCONTEXT is set, can update the next event without having to
    write to the device twice.
    
    Unfortunately there is a race between when the FLAG_IRQCONTEXT flag is
    set and later cleared where the interrupt handler have already started to
    write the next event to the device. If set_next_event() is called in
    this window the value is only cached in the driver but not written. This
    leads to the board to misbehave, or worse lockup and produce a splat.
    
       rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
       rcu:     0-...!: (0 ticks this GP) idle=f5e0/0/0x0 softirq=519/519 fqs=0 (false positive?)
       rcu:     (detected by 1, t=6502 jiffies, g=-595, q=77 ncpus=2)
       Sending NMI from CPU 1 to CPUs 0:
       NMI backtrace for cpu 0
       CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.10.0-rc5-arm64-renesas-00019-g74a6f86eaf1c-dirty #20
       Hardware name: Renesas Salvator-X 2nd version board based on r8a77965 (DT)
       pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
       pc : tick_check_broadcast_expired+0xc/0x40
       lr : cpu_idle_poll.isra.0+0x8c/0x168
       sp : ffff800081c63d70
       x29: ffff800081c63d70 x28: 00000000580000c8 x27: 00000000bfee5610
       x26: 0000000000000027 x25: 0000000000000000 x24: 0000000000000000
       x23: ffff00007fbb9100 x22: ffff8000818f1008 x21: ffff8000800ef07c
       x20: ffff800081c79ec0 x19: ffff800081c70c28 x18: 0000000000000000
       x17: 0000000000000000 x16: 0000000000000000 x15: 0000ffffc2c717d8
       x14: 0000000000000000 x13: ffff000009c18080 x12: ffff8000825f7fc0
       x11: 0000000000000000 x10: ffff8000818f3cd4 x9 : 0000000000000028
       x8 : ffff800081c79ec0 x7 : ffff800081c73000 x6 : 0000000000000000
       x5 : 0000000000000000 x4 : ffff7ffffe286000 x3 : 0000000000000000
       x2 : ffff7ffffe286000 x1 : ffff800082972900 x0 : ffff8000818f1008
       Call trace:
        tick_check_broadcast_expired+0xc/0x40
        do_idle+0x9c/0x280
        cpu_startup_entry+0x34/0x40
        kernel_init+0x0/0x11c
        do_one_initcall+0x0/0x260
        __primary_switched+0x80/0x88
       rcu: rcu_preempt kthread timer wakeup didn't happen for 6501 jiffies! g-595 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402
       rcu:     Possible timer handling issue on cpu=0 timer-softirq=262
       rcu: rcu_preempt kthread starved for 6502 jiffies! g-595 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=0
       rcu:     Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
       rcu: RCU grace-period kthread stack dump:
       task:rcu_preempt     state:I stack:0     pid:15    tgid:15    ppid:2      flags:0x00000008
       Call trace:
        __switch_to+0xbc/0x100
        __schedule+0x358/0xbe0
        schedule+0x48/0x148
        schedule_timeout+0xc4/0x138
        rcu_gp_fqs_loop+0x12c/0x764
        rcu_gp_kthread+0x208/0x298
        kthread+0x10c/0x110
        ret_from_fork+0x10/0x20
    
    The design have been part of the driver since it was first merged in
    early 2009. It becomes increasingly harder to trigger the issue the
    older kernel version one tries. It only takes a few boots on v6.10-rc5,
    while hundreds of boots are needed to trigger it on v5.10.
    
    Close the race condition by using the CMT channel lock for the two
    competing sections. The channel lock was added to the driver after its
    initial design.
    
    Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@xxxxxxxxxxxx>
    Link: https://lore.kernel.org/r/20240702190230.3825292-1-niklas.soderlund+renesas@xxxxxxxxxxxx
    Signed-off-by: Daniel Lezcano <daniel.lezcano@xxxxxxxxxx>
    Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>

diff --git a/drivers/clocksource/sh_cmt.c b/drivers/clocksource/sh_cmt.c
index 26919556ef5f0..b72b36e0abed8 100644
--- a/drivers/clocksource/sh_cmt.c
+++ b/drivers/clocksource/sh_cmt.c
@@ -528,6 +528,7 @@ static void sh_cmt_set_next(struct sh_cmt_channel *ch, unsigned long delta)
 static irqreturn_t sh_cmt_interrupt(int irq, void *dev_id)
 {
 	struct sh_cmt_channel *ch = dev_id;
+	unsigned long flags;
 
 	/* clear flags */
 	sh_cmt_write_cmcsr(ch, sh_cmt_read_cmcsr(ch) &
@@ -558,6 +559,8 @@ static irqreturn_t sh_cmt_interrupt(int irq, void *dev_id)
 
 	ch->flags &= ~FLAG_SKIPEVENT;
 
+	raw_spin_lock_irqsave(&ch->lock, flags);
+
 	if (ch->flags & FLAG_REPROGRAM) {
 		ch->flags &= ~FLAG_REPROGRAM;
 		sh_cmt_clock_event_program_verify(ch, 1);
@@ -570,6 +573,8 @@ static irqreturn_t sh_cmt_interrupt(int irq, void *dev_id)
 
 	ch->flags &= ~FLAG_IRQCONTEXT;
 
+	raw_spin_unlock_irqrestore(&ch->lock, flags);
+
 	return IRQ_HANDLED;
 }
 
@@ -780,12 +785,18 @@ static int sh_cmt_clock_event_next(unsigned long delta,
 				   struct clock_event_device *ced)
 {
 	struct sh_cmt_channel *ch = ced_to_sh_cmt(ced);
+	unsigned long flags;
 
 	BUG_ON(!clockevent_state_oneshot(ced));
+
+	raw_spin_lock_irqsave(&ch->lock, flags);
+
 	if (likely(ch->flags & FLAG_IRQCONTEXT))
 		ch->next_match_value = delta - 1;
 	else
-		sh_cmt_set_next(ch, delta - 1);
+		__sh_cmt_set_next(ch, delta - 1);
+
+	raw_spin_unlock_irqrestore(&ch->lock, flags);
 
 	return 0;
 }