On Tue, Feb 21, 2023 at 02:43:36PM +0100, Thomas Gleixner wrote: > Commit d125d1349abeb46945dc5e98f7824bf688266f13 upstream. > > syzbot reported a RCU stall which is caused by setting up an alarmtimer > with a very small interval and ignoring the signal. The reproducer arms the > alarm timer with a relative expiry of 8ns and an interval of 9ns. Not a > problem per se, but that's an issue when the signal is ignored because then > the timer is immediately rearmed because there is no way to delay that > rearming to the signal delivery path. See posix_timer_fn() and commit > 58229a189942 ("posix-timers: Prevent softirq starvation by small intervals > and SIG_IGN") for details. > > The reproducer does not set SIG_IGN explicitely, but it sets up the timers > signal with SIGCONT. That has the same effect as explicitely setting > SIG_IGN for a signal as SIGCONT is ignored if there is no handler set and > the task is not ptraced. > > The log clearly shows that: > > [pid 5102] --- SIGCONT {si_signo=SIGCONT, si_code=SI_TIMER, si_timerid=0, si_overrun=316014, si_int=0, si_ptr=NULL} --- > > It works because the tasks are traced and therefore the signal is queued so > the tracer can see it, which delays the restart of the timer to the signal > delivery path. But then the tracer is killed: > > [pid 5087] kill(-5102, SIGKILL <unfinished ...> > ... > ./strace-static-x86_64: Process 5107 detached > > and after it's gone the stall can be observed: > > syzkaller login: [ 79.439102][ C0] hrtimer: interrupt took 68471 ns > [ 184.460538][ C1] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: > ... > [ 184.658237][ C1] rcu: Stack dump where RCU GP kthread last ran: > [ 184.664574][ C1] Sending NMI from CPU 1 to CPUs 0: > [ 184.669821][ C0] NMI backtrace for cpu 0 > [ 184.669831][ C0] CPU: 0 PID: 5108 Comm: syz-executor192 Not tainted 6.2.0-rc6-next-20230203-syzkaller #0 > ... > [ 184.670036][ C0] Call Trace: > [ 184.670041][ C0] <IRQ> > [ 184.670045][ C0] alarmtimer_fired+0x327/0x670 > > posix_timer_fn() prevents that by checking whether the interval for > timers which have the signal ignored is smaller than a jiffie and > artifically delay it by shifting the next expiry out by a jiffie. That's > accurate vs. the overrun accounting, but slightly inaccurate > vs. timer_gettimer(2). > > The comment in that function says what needs to be done and there was a fix > available for the regular userspace induced SIG_IGN mechanism, but that did > not work due to the implicit ignore for SIGCONT and similar signals. This > needs to be worked on, but for now the only available workaround is to do > exactly what posix_timer_fn() does: > > Increase the interval of self-rearming timers, which have their signal > ignored, to at least a jiffie. > > Interestingly this has been fixed before via commit ff86bf0c65f1 > ("alarmtimer: Rate limit periodic intervals") already, but that fix got > lost in a later rework. > > Fixes: f2c45807d399 ("alarmtimer: Switch over to generic set/get/rearm routine") > Reported-by: syzbot+b9564ba6e8e00694511b@xxxxxxxxxxxxxxxxxxxxxxxxx > Signed-off-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx> > Acked-by: John Stultz <jstultz@xxxxxxxxxx> > Cc: stable@xxxxxxxxxxxxxxx > Link: https://lore.kernel.org/r/87k00q1no2.ffs@tglx > --- > Backport for 4.14, 4.19, 5.4 Now queued up, thanks. greg k-h