We observe late ktimer events due to C-state transitions from CPU Idle mode. The test results come from a 5.15-rt kernel with two timers-related patches backported: "softirq: Use a dedicated thread for timer wakeups." from Sebastian via https://lore.kernel.org/linux-rt-users/20211201173316.rm6o67qapqsx2n47@xxxxxxxxxxxxx/ and "tick: Fix timer storm since introduction of timersd" from Frederic via https://lore.kernel.org/linux-rt-users/20220405010752.1347437-2-frederic@xxxxxxxxxx/ The kernel is configured with NO_HZ_COMMON=y and NO_HZ_IDLE=y. Expected sequence of events: @ 0ms: userspace enqueues a timer event to fire in 10ms from now @ 10ms: ktimer event fires and triggers the online_periodic callback @ 10ms: userspace enqueues a timer event to fire in 10ms from now @ 20ms: ktimer event fires and triggers the online_periodic callback ... What we’re actually seeing via kprobes: @ 0ms: userspace enqueues a timer event to fire in 10ms from now @ 10ms: ktimer event fires and triggers the online_periodic callback @ 10ms: userspace enqueues a timer event to fire in 10ms from now @ 15ms: CPU enters a deep idle state and sleeps for a long time (20-100+)ms, let's say 70ms for this example @ 80ms: ktimer event fires and triggers the online_periodic callback @ 80ms: userspace enqueues a timer event to fire in 10ms from now The result is that the second timer callback runs after 75ms instead of the expected 10ms. When the timers work well, the hrtimer_interrupt causes raise_hrtimer_softirq() to run, as expected and __do_softirq() runs: <userspace>-13534 [007] 16947.039513: enqueue_hrtimer <idle>-0 [007] 16947.039555: hrtimer_next_event_without: 16947.037643167 <idle>-0 [007] 16947.039562: enqueue_hrtimer <idle>-0 [007] 16947.049453: hrtimer_interrupt <idle>-0 [007] 16947.049457: raise_hrtimer_softirq. <idle>-0 [007] 16947.049474: enqueue_hrtimer ktimers/7-80 [007] 16947.049483: __do_softirq ktimers/7-80 [007] 16947.049485: hrtimer_run_softirq. <userspace>-13534 [007] 16947.049523: enqueue_hrtimer <idle>-0 [007] 16947.049567: hrtimer_next_event_without: 16947.047643167 <idle>-0 [007] 16947.049573: enqueue_hrtimer <idle>-0 [007] 16947.049586: enqueue_hrtimer <idle>-0 [007] 16947.049594: hrtimer_next_event_without: 16947.047643167 <idle>-0 [007] 16947.049598: enqueue_hrtimer <idle>-0 [007] 16947.059418: hrtimer_interrupt <idle>-0 [007] 16947.059423: raise_hrtimer_softirq. <idle>-0 [007] 16947.059441: enqueue_hrtimer ktimers/7-80 [007] 16947.059449: __do_softirq ktimers/7-80 [007] 16947.059451: hrtimer_run_softirq. <userspace>-13534 [007] 16947.059490: enqueue_hrtimer <idle>-0 [007] 16947.059529: hrtimer_next_event_without: 16947.057643167 <idle>-0 [007] 16947.059531: enqueue_hrtimer <idle>-0 [007] 16947.069434: hrtimer_interrupt When the timers are delayed, the trouble appears to begin when the hrtimer_interrupt results in execution of hrtimer_run_queues() instead of raise_hrtimer_softirq(): <userspace>-13534 [007] 16947.069504: enqueue_hrtimer <idle>-0 [007] 16947.069547: hrtimer_next_event_without: 16947.067643167 <idle>-0 [007] 16947.069553: enqueue_hrtimer <idle>-0 [007] 16947.069567: enqueue_hrtimer <idle>-0 [007] 16947.069575: hrtimer_next_event_without: 16947.067643167 <idle>-0 [007] 16947.069579: enqueue_hrtimer <idle>-0 [007] 16947.078270: hrtimer_interrupt <idle>-0 [007] 16947.078278: hrtimer_run_queues. <idle>-0 [007] 16947.078300: enqueue_hrtimer ktimers/7-80 [007] 16947.078308: __do_softirq ksoftirqd/7-81 [007] 16947.078338: ksoftirqd_should_run 0 <idle>-0 [007] 16947.078361: hrtimer_next_event_without: 16947.067643167 <idle>-0 [007] 16947.079323: hrtimer_interrupt <idle>-0 [007] 16947.079328: hrtimer_run_queues. <idle>-0 [007] 16947.079334: enqueue_hrtimer ksoftirqd/7-81 [007] 16947.079359: ksoftirqd_should_run 128 ksoftirqd/7-81 [007] 16947.079360: __do_softirq ksoftirqd/7-81 [007] 16947.079361: hrtimer_interrupt ksoftirqd/7-81 [007] 16947.079361: raise_hrtimer_softirq. ksoftirqd/7-81 [007] 16947.079364: ksoftirqd_should_run 0 <idle>-0 [007] 16947.079375: hrtimer_next_event_without: 9223372036854775807 <idle>-0 [007] 16947.079376: tick_nohz_get_sleep_length: 86.838 ms <idle>-0 [007] 16947.079378: enqueue_hrtimer Is the problem perhaps that ksoftirqd_should_run() needs to check local_timer_softirq_pending() created by Frederic's patch as well as local_softirq_pending()? For the moment, we find that replacing NO_HZ_COMMON=y and NO_HZ_IDLE=y with NO_HZ_PERIODIC prevents the undesirable sleep-state transition, but clearly this solution is not ideal. Thanks for any suggestions, Alison Chaiken achaiken@xxxxxxxxxxx Aurora Innovation