On 2024-10-22 15:28:56 [+0200], Frederic Weisbecker wrote: > Le Fri, Oct 04, 2024 at 12:17:04PM +0200, Sebastian Andrzej Siewior a écrit : > > A timer/ hrtimer softirq is raised in-IRQ context. With threaded > > interrupts enabled or on PREEMPT_RT this leads to waking the ksoftirqd > > for the processing of the softirq. > > It took me some time to understand the actual problem (yeah I know...) > > Can this be rephrased as: "Timer and hrtimer softirq vectors are special > in that they are always raised in-IRQ context whereas other vectors are > more likely to be raised from threaded interrupts or any regular tasks > when threaded interrupts or PREEMPT_RT are enabled. This leads to > waking ksoftirqd for the processing of the softirqs whenever timer > vectors are involved. Oki. > > Once the ksoftirqd is marked as pending (or is running) it will collect > > all raised softirqs. This in turn means that a softirq which would have > > been processed at the end of the threaded interrupt, which runs at an > > elevated priority, is now moved to ksoftirqd which runs at SCHED_OTHER > > priority and competes with every regular task for CPU resources. > > But for ksoftirqd to collect other non-timers softirqs, those vectors must > have been raised before from a threaded interrupt, right? So why those > vectors didn't get a chance to execute at the end of that threaded interrupt? This statement is no longer accurate since d15121be74856 ("Revert "softirq: Let ksoftirqd do its job"") So the "collect all" part is no longer. > OTOH one problem I can imagine is a threaded interrupt preempting ksoftirqd > which must wait for ksoftirqd to complete due to the local_bh_disable() > in the beginning of irq_forced_thread_fn(). But then isn't there some > PI involved on the local lock? Yes, there is PI involved on the local lock. So this will happen. > Sorry I'm probably missing something... Try again without the "ksoftirqd will collect it all" since this won't happen since the revert I mentioned. > > This introduces long delays on heavy loaded systems and is not desired > > especially if the system is not overloaded by the softirqs. > > > > Split the TIMER_SOFTIRQ and HRTIMER_SOFTIRQ processing into a dedicated > > timers thread and let it run at the lowest SCHED_FIFO priority. > > Wake-ups for RT tasks happen from hardirq context so only timer_list timers > > and hrtimers for "regular" tasks are processed here. > > That last sentence confuses me. How are timers for non regular task processed > elsewhere? Ah you mean RT tasks rely on hard hrtimers? Correct. A clock_nanosleep() for a RT task will result in wake_up() from hardirq. A clock_nanosleep() for a !RT task will result in wake_up() from ksoftirqd or now the suggested ktimersd. Quick question: Do we want this in forced-threaded mode, too? The timer (timer_list timer) and all HRTIMER_MODE_SOFT are handled in ksoftirqd. > > The higher priority > > ensures that wakeups are performed before scheduling SCHED_OTHER tasks. > > > > Using a dedicated variable to store the pending softirq bits values > > ensure that the timer are not accidentally picked up by ksoftirqd and > > other threaded interrupts. > > It shouldn't be picked up by ksoftirqd since it runs at lower priority. > > However if ksoftirqd is already running while a timer fires, then > > ksoftird will be PI-boosted due to the BH-lock to ktimer's priority. > > Ideally we try to avoid having ksoftirqd running. > > > > The timer thread can pick up pending softirqs from ksoftirqd but only > > if the softirq load is high. It is not be desired that the picked up > > softirqs are processed at SCHED_FIFO priority under high softirq load > > but this can already happen by a PI-boost by a force-threaded interrupt. > > > > [ frederic@xxxxxxxxxx: rcutorture.c fixes, storm fix by introduction of > > local_pending_timers() for tick_nohz_next_event() ] > > > > [ junxiao.chang@xxxxxxxxx: Ensure ktimersd gets woken up even if a > > softirq is currently served. ] > > > > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx> > > --- > > include/linux/interrupt.h | 29 ++++++++++++++ > > kernel/rcu/rcutorture.c | 6 +++ > > kernel/softirq.c | 82 ++++++++++++++++++++++++++++++++++++++- > > kernel/time/hrtimer.c | 4 +- > > kernel/time/tick-sched.c | 2 +- > > kernel/time/timer.c | 2 +- > > 6 files changed, 120 insertions(+), 5 deletions(-) > > > > diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h > > index 457151f9f263d..4a4f367cd6864 100644 > > --- a/include/linux/interrupt.h > > +++ b/include/linux/interrupt.h > > @@ -616,6 +616,35 @@ extern void __raise_softirq_irqoff(unsigned int nr); > > extern void raise_softirq_irqoff(unsigned int nr); > > extern void raise_softirq(unsigned int nr); > > > > +#ifdef CONFIG_PREEMPT_RT > > This needs a comment section to explain why a dedicated > timers processing is needed. Okay. > > +DECLARE_PER_CPU(struct task_struct *, timersd); > > +DECLARE_PER_CPU(unsigned long, pending_timer_softirq); > > + > > +extern void raise_timer_softirq(void); > > +extern void raise_hrtimer_softirq(void); > > + > > +static inline unsigned int local_pending_timers(void) > > Let's align with local_softirq_pending() naming and rather > have local_timers_pending() ? good. > > +{ > > + return __this_cpu_read(pending_timer_softirq); > > +} > > + > > +#ifdef CONFIG_PREEMPT_RT > > +static void timersd_setup(unsigned int cpu) > > +{ > > That also needs a comment. Why we want the priority I guess. … > > +void raise_hrtimer_softirq(void) > > +{ > > + raise_ktimers_thread(HRTIMER_SOFTIRQ); > > +} > > + > > +void raise_timer_softirq(void) > > +{ > > + unsigned long flags; > > + > > + local_irq_save(flags); > > + raise_ktimers_thread(TIMER_SOFTIRQ); > > + wake_timersd(); > > This is supposed to be called from hardirq only, right? > Can't irq_exit_rcu() take care of it? Why is it different > from HRTIMER_SOFTIRQ ? Good question. This shouldn't be any different compared to the hrtimer case. This is only raised in hardirq, so yes, the irq_save can go away and the wake call, too. > Thanks. Sebastian