On 27/04/2020 21.26, Tom Zanussi wrote: > On Mon, 2020-04-27 at 15:06 -0400, Steven Rostedt wrote: >> On Mon, 27 Apr 2020 15:10:00 +0200 >> Rasmus Villemoes <rasmus.villemoes@xxxxxxxxx> wrote: >> >>> Reverting >>> >>> b1a471ec4df1 - hrtimer: Prevent using hrtimer_grab_expiry_lock() on >>> migration_base >>> 40aae5708e7a - hrtimer: Add a missing bracket and hide >>> `migration_base' >>> on !SMP >>> >>> on top of v4.19.94-rt39 makes that problem go away, i.e. the board >>> reboots as expected. >>> >> Thanks Rasmus for looking into this. Tom now maintains 4.19-rt. >> >> Tom, care to pull in these patches on top of 4.19-rt? >> > > Those patches are already in 4.19-rt - he's saying that reverting them > fixes the problem. > > I'm guessing that the assumption of base or base->cpu_base always being > non-NULL in those patches might be wrong. If so, the below patch > should fix the problem: > > Subject: [PATCH] hrtimer: Add back base and base->cpu_base checks in > hrtimer_grab_expiry_lock() > > 4.19 commit b1a471ec4df1 [hrtimer: Prevent using > hrtimer_grab_expiry_lock() on migration_base] removed the NULL checks > for timer->base and timer->base->cpu_base on the assumption that > they're always non-NULL. That assumption is apparently not to be > true, so add the checks back. > > Signed-off-by: Tom Zanussi <zanussi@xxxxxxxxxx> > --- > kernel/time/hrtimer.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c > index e54a95de8b79..6f20cf23008b 100644 > --- a/kernel/time/hrtimer.c > +++ b/kernel/time/hrtimer.c > @@ -953,7 +953,7 @@ void hrtimer_grab_expiry_lock(const struct hrtimer *timer) > { > struct hrtimer_clock_base *base = READ_ONCE(timer->base); > > - if (timer->is_soft && is_migration_base(base)) { > + if (timer->is_soft && base && base->cpu_base && is_migration_base(base)) { I'm sorry, but no, I don't think that can be it. For !SMP (my case), is_migration_base() is always false, so with or without the above, the whole if() is false. Also, the symptoms do not look like a NULL pointer deref, but more like a dead (or live) lock - so I'm guessing (and that's just a wild guess) that the lock/unlock sequence is needed to give some other thread a priority boost to make the whole machine make forward progress. Rasmus