Re: [patch 1/3] timers: raise timer softirq on __mod_timer/add_timer_on

Marcelo Tosatti <mtosatti@xxxxxxxxxx> · Thu, 30 May 2019 16:23:29 -0300

Hi Anna-Maria,

On Wed, May 29, 2019 at 04:53:05PM +0200, Anna-Maria Gleixner wrote:
> On Mon, 15 Apr 2019, Marcelo Tosatti wrote:
> 
> [...]
> 
> > The patch "timers: do not raise softirq unconditionally" from Thomas
> > attempts to address that by checking, in the sched tick, whether its
> > necessary to raise the timer softirq. 

https://lore.kernel.org/patchwork/patch/446045/

>> Unfortunately, it attempts to grab
> > the tvec base spinlock which generates the issue described in the patch
> > "Revert "timers: do not raise softirq unconditionally"".

https://lore.kernel.org/patchwork/patch/552474/

> Both patches are not available in the version your patch set is based
> on. Better pointers would be helpful.

See above.

> 
> > tvec_base->lock protects addition of timers to the wheel versus
> > timer interrupt execution.
> 
> The timer_base->lock (formally known as tvec_base->lock), synchronizes all
> accesses to timer_base and not only addition of timers versus timer
> interrupt execution. Deletion of timers, getting the next timer interrupt,
> forwarding the base clock and migration of timers are protected as well by
> timer_base->lock.

Right.

> > This patch does not grab the tvec base spinlock from irq context,
> > but rather performs a lockless access to base->pending_map.
> 
> I cannot see where this patch performs a lockless access to
> timer_base->pending_map.

[patch 2/3] timers: do not raise softirq unconditionally (spinlockless
version)

> > It handles the the race between timer addition and timer interrupt
> > execution by unconditionally (in case of isolated CPUs) raising the
> > timer softirq after making sure the updated bitmap is visible 
> > on remote CPUs.
> 
> So after modifying a timer on a non housekeeping timer base, the timer
> softirq is raised - even if there is no pending timer in the next
> bucket. Only with this patch, this shouldn't be a problem - but it is an
> additional raise of timer softirq and an overhead when adding a timer,
> because the normal timer softirq is raised from sched tick anyway.

It should be clear why this is necessary when reading

[patch 2/3] timers: do not raise softirq unconditionally (spinlockless
version)

> 
> > Signed-off-by: Marcelo Tosatti <mtosatti@xxxxxxxxxx>
> > 
> > ---
> >  kernel/time/timer.c |   38 ++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 38 insertions(+)
> > 
> > Index: linux-rt-devel/kernel/time/timer.c
> > ===================================================================
> > --- linux-rt-devel.orig/kernel/time/timer.c	2019-04-15 13:56:06.974210992 -0300
> > +++ linux-rt-devel/kernel/time/timer.c	2019-04-15 14:21:02.788704354 -0300
> > @@ -1056,6 +1063,17 @@
> >  		internal_add_timer(base, timer);
> >  	}
> >  
> > +	if (!housekeeping_cpu(base->cpu, HK_FLAG_TIMER) &&
> > +	    !(timer->flags & TIMER_DEFERRABLE)) {
> > +		call_single_data_t *c;
> > +
> > +		c = per_cpu_ptr(&raise_timer_csd, base->cpu);
> > +
> > +		/* Make sure bitmap updates are visible on remote CPUs */
> > +		smp_wmb();
> > +		smp_call_function_single_async(base->cpu, c);
> > +	}
> > +
> >  out_unlock:
> >  	raw_spin_unlock_irqrestore(&base->lock, flags);
> >
> 
> Could you please explain me, why you decided to use the above
> implementation for raising the timer softirq after modifying a timer?

Because of the following race condition which is open after

"[patch 2/3] timers: do not raise softirq unconditionally (spinlockless
version)": 

CPU-0				CPU-1

				jiffies=99
runs
add_timer_on, with 
timer->expires=100
				jiffies=100
				run_softirq(), sees pending bitmap clear

add_timer_on 
returns and 
timer was not executed

P)

This race did not exist before. 

So by raising a softirq on the remote CPU 
at point P), its ensured the timer will 
be executed ASAP.