On Thu, Feb 9, 2023 at 7:40 AM Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote: > On Thu, Feb 09 2023 at 12:19, Michael Nazzareno Trimarchi wrote: > > On Wed, Feb 8, 2023 at 7:06 PM Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote: > >> I wrote that patch against the back then mainline code. No idea if it's > >> still applying, but the underlying issue is still the same AFAICT. > >> > >> It needs some polishing and a proper changelog. > >> > > Ok, I will try to update it on some mainline kernel in my environment > > and test it back. I need > > a little information if it's possible. Consider that I have no > > experience in this area. I understand how > > code was designed in general but the part around the freezer and all > > those code you remove, what was the logic behind in the removed code? > > What I can oracle out of that well commented gem is: > > A userspace task invokes clock_nanosleep(CLOCK_*_ALARM, ...), which > arms an alarm timer. The expiry of an alarmtimer causes the system to > come out of suspend. > > As the task invokes schedule() it can also be brought back from > schedule() via a signal. If that happens then the task cancels the > alarmtimer and returns to handle the signal. While doing that it can > be frozen, which means the alarm and therefore the wake from suspend > is lost. > > To prevent that the code tries to save the earliest expiring alarm if > the task is marked freezing() and the suspend code takes that into > account. > > John, did I summarize that correctly? > > The change I made remove that magic and marks the task freezable when it > goes to schedule, which prevents the signal wakeup. That ensures that > the alarm stays armed during freeze/suspend. > > That obviously needs some testing and scrunity by the folks which use > this mechanism. IIRC that's used by android, but I might be wrong. So, thanks for dredging this old thread up, I'm sorry I didn't see it the first time it came around. Not having a clear memory of why we do the (min == 0) early return, I went digging in, and found it was in the original git commit, so I went looking to the archives. It's completely not present in the first version of the patch: https://lore.kernel.org/lkml/1288809079-14663-8-git-send-email-john.stultz@xxxxxxxxxx/ But it did appear in the second version: https://lore.kernel.org/lkml/1290136329-18291-6-git-send-email-john.stultz@xxxxxxxxxx/ And from there it's a clear case of wanting to avoid setting the RTC if there were just no timers to expire. But, indeed this is a bug, as initializing min to ktime_set(0, 0) as the "invalid" case isn't a good plan, as it might be possible that suspend is run right as an alarmtimer expires, and you get a real zero delta value (as has been reported). Instead it seemed I should have used KTIME_MAX as the "invalid" case (as Thomas' patch uses). However, before the patch was merged, it changed further to handle the freezer waking a current sleeper: https://lore.kernel.org/lkml/1294280159-2513-13-git-send-email-john.stultz@xxxxxxxxxx/ Which was then used to initialize the min value (still erroneously using 0 as an "invalid" value) in the case that the freezer woke a task sleeping which would cause the alarm to be lost (as Thomas summarized). Thomas' patch fixes the erronious 0-as-invalid initialization issue using KTIME_MAX but also simplifies the logic getting rid of the freezer handling. I don't have as much familiarity with the freezer handling change, so while it looks sane, I can't say I would likely catch an issue doing a visual review. Michael: If you are still intending to send the patch out, please do, otherwise I've already forward ported it so I can do some testing with it. I'm happy to put together a commit message and send it out if that's easier for you. And, just for correct Reported-by: tags: Michael Trimarchi, are you the same Michael (michael@xxxxxxxxx) that originally reported this issue? thanks -john