[ QUOTE ] Hi Linus, Please revert: commit 5baefd6d84163443215f4a99f6a20f054ef11236 Author: John Stultz <johnstul@xxxxxxxxxx> Date: Tue Jul 10 18:43:25 2012 -0400 hrtimer: Update hrtimer base offsets each hrtimer_interrupt This breaks resume on the iBook G4 and Toshiba Portege R500 (at least), by adding an excessive delay to it (the Toshiba box sometimes hangs hard during resume from system suspend). According to Andreas (https://lkml.org/lkml/2012/7/15/66): "Apparently during or before noirq resume the system is hanging by the same amount of time as the system was sleeping." which seems to agree with my observations. Given that the two known-affected boxes are so different, it is quite probable that the total number of affected systems is actually quite high. Thanks! To everyone involved: the fact that this change, which was likely to introduce regressions from the look of it alone, has been pushed to Linus (an to -stable at the same time!) so late in the cycle, is seriuosly disappointing. Thanks, Rafael [ /QUOTE ] Hi, when I booted 1st into Linux-3.5-rc7 (a few hours after release) I had a call-trace in get_next_timer_interrupt() (NULL pointer dereference) on early-boot. The machine got frozen. I can't say if this is related to the same issue here, but I can confirm after suspend + resume the machine (sandy-bridge ultrabook) I am working on is in an unusable state. I had to cold reboot/restart. Regards, - Sedat - P.S.: Unfortunately, I could not reproduce the NULL-deref again. Thomas gave me some instruction to enable some debugobjects kernel-options (see attached backlog from IRC).
Backlog #linux-rt (OFTC, German local-time UTC+2): [09:27:26] <dileks> hi [09:28:11] <dileks> tglx jstultz: with 3.5-rc7 I have a NULL pointer derefence in get_next_timer_interrupt [09:28:19] <dileks> native_sched_clock [09:28:31] <dileks> tick_nohz_stop_sched_tick.isra [09:28:43] <dileks> tick_nohz_idle_enter [09:28:46] <dileks> cpu_idle [09:28:52] <dileks> start_secondary [09:28:58] <dileks> machine freezes [09:29:03] <tglx> brilliant [09:29:05] <dileks> cold reboot/restart [09:29:29] -*- dileks -> breakfast [09:41:32] <-- trem (~trem@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx) hat das Netzwerk verlassen (Quit: Ex-Chat) [09:45:27] <dileks> re [09:46:18] <dileks> tglx: any idea? [09:47:42] <tglx> when does this happen ? [09:47:48] <tglx> early boot ? [09:48:06] <dileks> yes. nothing in the logs [10:18:26] <tglx> hmm [10:19:21] <tglx> so it explodes in get_next_timer_interrupt(), right ? [10:20:01] <tglx> can you enable debugobjects ? [10:20:12] <dileks> yes, as as I saw and noted on a postit scheet [10:20:41] <tglx> DEBUG_OBJECTS [10:20:46] <tglx> DEBUG_OBJECTS_FREE [10:20:51] <tglx> DEBUG_OBJECTS_TIMERS [10:21:05] <tglx> DEBUG_OBJECTS_ENABLE_DEFAULT [10:22:19] <tglx> usually explosions in get_next_timer_interrupt() are caused by timers being corrupted [10:22:52] <tglx> debug objects usually can catch it and let the box survive plus gives us proper info about the wreckage [10:25:37] <dileks> OK [10:26:03] <dileks> is the build-tree exploding in size? [10:47:48] <tglx> not much [10:48:17] <tglx> it's only the debugobject code itself plus the timer code which grows a bit [10:48:26] <tglx> less than 1k I think -dileks // 15-Jul-2011