On Thu, May 21, 2020 at 07:00:20PM +0200, Juri Lelli wrote: > ksoftirqd/3-26 [003] 99.942485: timer_expire_entry: timer=0xffffffffa55a9d20 function=clocksource_watchdog now=4294759328 baseclk=4294759328 > ksoftirqd/3-26 [003] 99.942489: timer_start: timer=0xffffffffa55a9d20 function=clocksource_watchdog expires=4294759822 [timeout=494] cpu=4 idx=114 flags=D|P|I > ksoftirqd/3-26 [003] 99.942491: timer_expire_exit: timer=0xffffffffa55a9d20 > sysjitter-2377 [004] 99.942491: call_function_single_entry: vector=251 > sysjitter-2377 [004] 99.942492: call_function_single_exit: vector=251 > ksoftirqd/3-26 [003] 99.942493: softirq_exit: vec=1 [action=TIMER] > ksoftirqd/3-26 [003] 99.942494: softirq_entry: vec=7 [action=SCHED] > sysjitter-2377 [004] 99.942494: hrtimer_start: hrtimer=0xffff92e9df91fd20 function=tick_sched_timer/0x0 expires=92630457385 softexpires=92630457385 > --- > > So, my understanding is that clock gets reactivated on CPU 4 to run > clocksource watchdog (indeed if I use tsc=nowatchdog this above doesn't > happen, but the potential issue is still there I guess). CPU 4 is being disturbed by an IPI. It's possibly caused by RCU, there is a pending fix that I need to refresh and repost but you may want to give it a try: https://lore.kernel.org/lkml/20200515003429.4317-1-frederic@xxxxxxxxxx/ > > --- > sysjitter-2377 [004] 100.438455: local_timer_entry: vector=236 > sysjitter-2377 [004] 100.438456: hrtimer_cancel: hrtimer=0xffff92e9df91fd20 > sysjitter-2377 [004] 100.438458: hrtimer_expire_entry: hrtimer=0xffff92e9df91fd20 now=92630461337 function=tick_sched_timer/0x0 > sysjitter-2377 [004] 100.438460: kernel_stack: <stack trace> > => __ftrace_trace_stack (ffffffffa37da921) > => __raise_softirq_irqoff (ffffffffa36daf50) > => raise_softirq (ffffffffa36db0fe) > => update_process_times (ffffffffa3768f9d) > => tick_sched_handle (ffffffffa377aaa2) > => tick_sched_timer (ffffffffa377ad53) > => __hrtimer_run_queues (ffffffffa3769cf0) > => hrtimer_interrupt (ffffffffa376a58a) > => smp_apic_timer_interrupt (ffffffffa40028f8) > => apic_timer_interrupt (ffffffffa4001b7f) > sysjitter-2377 [004] 100.438461: softirq_raise: vec=1 [action=TIMER] > sysjitter-2377 [004] 100.438464: kernel_stack: <stack trace> > => __ftrace_trace_stack (ffffffffa37da921) > => __raise_softirq_irqoff (ffffffffa36daf50) > => raise_softirq (ffffffffa36db0fe) > => trigger_load_balance (ffffffffa371cb9c) > => update_process_times (ffffffffa3768fc7) > => tick_sched_handle (ffffffffa377aaa2) > => tick_sched_timer (ffffffffa377ad53) > => __hrtimer_run_queues (ffffffffa3769cf0) > => hrtimer_interrupt (ffffffffa376a58a) > => smp_apic_timer_interrupt (ffffffffa40028f8) > => apic_timer_interrupt (ffffffffa4001b7f) > sysjitter-2377 [004] 100.438465: softirq_raise: vec=7 [action=SCHED] > --- > > Tick fires, the two softirq_raise (which you wondered about) seems to > come from > > 1. update_process_times::run_local_timers (TIMER_SOFTIRQ) > 2. update_process_times::scheduler_tick::trigger_load_balance (SCHED_SOFTIRQ) Well, the places raising the softirqs are always the same. Now how the timer internals can decide to raise the TIMER_SOFTIRQ when there is no callback expired is more mysterious. > > --- > sysjitter-2377 [004] 100.438468: hrtimer_expire_exit: hrtimer=0xffff92e9df91fd20 > sysjitter-2377 [004] 100.438470: local_timer_exit: vector=236 > sysjitter-2377 [004] 100.438473: sched_wakeup: ksoftirqd/4:31 [120] success=1 CPU:004 > sysjitter-2377 [004] 100.438474: tick_stop: success=0 dependency=SCHED > sysjitter-2377 [004] 100.438476: hrtimer_start: hrtimer=0xffff92e9df91fd20 function=tick_sched_timer/0x0 expires=92630512937 softexpires=92630512937 > sysjitter-2377 [004] 100.438484: irq_work_entry: vector=246 > sysjitter-2377 [004] 100.438491: irq_work_exit: vector=246 > sysjitter-2377 [004] 100.438492: tick_stop: success=0 dependency=SCHED > sysjitter-2377 [004] 100.438495: sched_switch: sysjitter:2377 [120] R ==> ksoftirqd/4:31 [120] > ksoftirqd/4-31 [004] 100.438497: softirq_entry: vec=1 [action=TIMER] > ksoftirqd/4-31 [004] 100.438499: timer_cancel: timer=0xffffffffa55a9d20 > ksoftirqd/4-31 [004] 100.438501: timer_expire_entry: timer=0xffffffffa55a9d20 function=clocksource_watchdog now=4294759824 baseclk=4294759824 > ksoftirqd/4-31 [004] 100.438504: timer_start: timer=0xffffffffa55a9d20 function=clocksource_watchdog expires=4294760322 [timeout=498] cpu=5 idx=113 flags=D|P|I > --- > > Queueing clocksource watchdog on CPU5. > > --- > ksoftirqd/4-31 [004] 100.438506: timer_expire_exit: timer=0xffffffffa55a9d20 > ksoftirqd/4-31 [004] 100.438507: local_timer_entry: vector=236 > --- > > Back to back timer? It looks like the tick has been reprogrammed very early indeed. > > --- > ksoftirqd/4-31 [004] 100.438509: hrtimer_cancel: hrtimer=0xffff92e9df91fd20 > ksoftirqd/4-31 [004] 100.438511: hrtimer_expire_entry: hrtimer=0xffff92e9df91fd20 now=92630515022 function=tick_sched_timer/0x0 > ksoftirqd/4-31 [004] 100.438515: kernel_stack: <stack trace> > => __ftrace_trace_stack (ffffffffa37da921) > => __raise_softirq_irqoff (ffffffffa36daf50) > => raise_softirq (ffffffffa36db0fe) > => rcu_sched_clock_irq (ffffffffa375af4a) > => update_process_times (ffffffffa3768fa4) > => tick_sched_handle (ffffffffa377aaa2) > => tick_sched_timer (ffffffffa377ad53) > => __hrtimer_run_queues (ffffffffa3769cf0) > => hrtimer_interrupt (ffffffffa376a58a) > => smp_apic_timer_interrupt (ffffffffa40028f8) > => apic_timer_interrupt (ffffffffa4001b7f) > => filter_pred_32 (ffffffffa37f3357) > => filter_match_preds (ffffffffa37f3510) > => trace_event_buffer_commit (ffffffffa37dc7eb) > => trace_event_raw_event_softirq (ffffffffa36dab77) > => __do_softirq (ffffffffa420025a) > => run_ksoftirqd (ffffffffa36dadc6) > => smpboot_thread_fn (ffffffffa36ffdb8) > => kthread (ffffffffa36f9fb7) > => ret_from_fork (ffffffffa4000215) > ksoftirqd/4-31 [004] 100.438516: softirq_raise: vec=9 [action=RCU] > ksoftirqd/4-31 [004] 100.438520: kernel_stack: <stack trace> > => __ftrace_trace_stack (ffffffffa37da921) > => __raise_softirq_irqoff (ffffffffa36daf50) > => raise_softirq (ffffffffa36db0fe) > => trigger_load_balance (ffffffffa371cb9c) > => update_process_times (ffffffffa3768fc7) > => tick_sched_handle (ffffffffa377aaa2) > => tick_sched_timer (ffffffffa377ad53) > => __hrtimer_run_queues (ffffffffa3769cf0) > => hrtimer_interrupt (ffffffffa376a58a) > => smp_apic_timer_interrupt (ffffffffa40028f8) > => apic_timer_interrupt (ffffffffa4001b7f) > => filter_pred_32 (ffffffffa37f3357) > => filter_match_preds (ffffffffa37f3510) > => trace_event_buffer_commit (ffffffffa37dc7eb) > => trace_event_raw_event_softirq (ffffffffa36dab77) > => __do_softirq (ffffffffa420025a) > => run_ksoftirqd (ffffffffa36dadc6) > => smpboot_thread_fn (ffffffffa36ffdb8) > => kthread (ffffffffa36f9fb7) > => ret_from_fork (ffffffffa4000215) > ksoftirqd/4-31 [004] 100.438520: softirq_raise: vec=7 [action=SCHED] > ksoftirqd/4-31 [004] 100.438521: hrtimer_expire_exit: hrtimer=0xffff92e9df91fd20 > ksoftirqd/4-31 [004] 100.438523: hrtimer_start: hrtimer=0xffff92e9df91fd20 function=tick_sched_timer/0x0 expires=92631512937 softexpires=92631512937 > ksoftirqd/4-31 [004] 100.438525: local_timer_exit: vector=236 > ksoftirqd/4-31 [004] 100.438527: tick_stop: success=0 dependency=SCHED > --- > > At this point (I think) ksoftirqd is already going to sleep, so only > sysjitter will be eventually executing on CPU4, but tick_stop check > still sees nr_running == 2. As long at it hasn't been dequeued from the runqueue, tick_stop sees nr_running == 2. > > --- > ksoftirqd/4-31 [004] 100.438530: softirq_exit: vec=1 [action=TIMER] Was there a corresponding softirq_enter? With timers expiring inside? > ksoftirqd/4-31 [004] 100.438531: softirq_entry: vec=7 [action=SCHED] > ksoftirqd/4-31 [004] 100.438533: softirq_exit: vec=7 [action=SCHED] > ksoftirqd/4-31 [004] 100.438534: softirq_entry: vec=7 [action=SCHED] > ksoftirqd/4-31 [004] 100.438535: softirq_exit: vec=7 [action=SCHED] > ksoftirqd/4-31 [004] 100.438536: softirq_entry: vec=9 [action=RCU] > ksoftirqd/4-31 [004] 100.438537: softirq_exit: vec=9 [action=RCU] > ksoftirqd/4-31 [004] 100.438540: sched_switch: ksoftirqd/4:31 [120] S ==> sysjitter:2377 [120] > --- > > And tick again fires below. > > --- > sysjitter-2377 [004] 100.439509: local_timer_entry: vector=236 > sysjitter-2377 [004] 100.439511: hrtimer_cancel: hrtimer=0xffff92e9df91fd20 > sysjitter-2377 [004] 100.439512: hrtimer_expire_entry: hrtimer=0xffff92e9df91fd20 now=92631515408 function=tick_sched_timer/0x0 Right that's expected behaviour. > sysjitter-2377 [004] 100.439518: softirq_raise: vec=7 [action=SCHED] > sysjitter-2377 [004] 100.439520: hrtimer_expire_exit: hrtimer=0xffff92e9df91fd20 > sysjitter-2377 [004] 100.439521: hrtimer_start: hrtimer=0xffff92e9df91fd20 function=tick_sched_timer/0x0 expires=92632512937 softexpires=92632512937 > sysjitter-2377 [004] 100.439522: local_timer_exit: vector=236 > sysjitter-2377 [004] 100.439525: sched_wakeup: ksoftirqd/4:31 [120] success=1 CPU:004 > sysjitter-2377 [004] 100.439527: tick_stop: success=0 dependency=SCHED > sysjitter-2377 [004] 100.439529: irq_work_entry: vector=246 > sysjitter-2377 [004] 100.439531: irq_work_exit: vector=246 > sysjitter-2377 [004] 100.439532: tick_stop: success=0 dependency=SCHED > sysjitter-2377 [004] 100.439534: sched_switch: sysjitter:2377 [120] R ==> ksoftirqd/4:31 [120] > ksoftirqd/4-31 [004] 100.439536: softirq_entry: vec=1 [action=TIMER] > ksoftirqd/4-31 [004] 100.439537: softirq_exit: vec=1 [action=TIMER] The empty timer softirq again. > ksoftirqd/4-31 [004] 100.439538: softirq_entry: vec=7 [action=SCHED] > ksoftirqd/4-31 [004] 100.439540: softirq_exit: vec=7 [action=SCHED] I'm also wondering if it's expected to see the sched softirq repeating so often. > ksoftirqd/4-31 [004] 100.439541: sched_switch: ksoftirqd/4:31 [120] S ==> sysjitter:2377 [120] > --- > > History seems to repeat itself until ~2s later (this is one of the > pathological runs) when an ipi fires on CPU4 and SCHED dependecy is seen > as cleared (as only sysjitter thread was is running at that point). > > Does this make sense and help in any way? Suggestions for debugging this > further? :-) Let's see if I can reproduce it first. If not I'll need to bother you further :) Thanks!