Re: [tip: timers/urgent] hrtimers: Handle CPU state correctly on hotplug

Borislav Petkov <bp@xxxxxxxxx> · Thu, 16 Jan 2025 11:56:30 +0100

On Thu, Jan 16, 2025 at 09:50:38AM -0000, tip-bot2 for Koichiro Den wrote:
> The following commit has been merged into the timers/urgent branch of tip:
> 
> Commit-ID:     e00954d8b0a2d54075131fbad4a11b2b7355eee1
> Gitweb:        https://git.kernel.org/tip/e00954d8b0a2d54075131fbad4a11b2b7355eee1
> Author:        Koichiro Den <koichiro.den@xxxxxxxxxxxxx>
> AuthorDate:    Fri, 20 Dec 2024 22:44:21 +09:00
> Committer:     Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> CommitterDate: Thu, 16 Jan 2025 10:39:25 +01:00
> 
> hrtimers: Handle CPU state correctly on hotplug
> 
> Consider a scenario where a CPU transitions from CPUHP_ONLINE to halfway
> through a CPU hotunplug down to CPUHP_HRTIMERS_PREPARE, and then back to
> CPUHP_ONLINE:
> 
> Since hrtimers_prepare_cpu() does not run, cpu_base.hres_active remains set
> to 1 throughout. However, during a CPU unplug operation, the tick and the
> clockevents are shut down at CPUHP_AP_TICK_DYING. On return to the online
> state, for instance CFS incorrectly assumes that the hrtick is already
> active, and the chance of the clockevent device to transition to oneshot
> mode is also lost forever for the CPU, unless it goes back to a lower state
> than CPUHP_HRTIMERS_PREPARE once.
> 
> This round-trip reveals another issue; cpu_base.online is not set to 1
> after the transition, which appears as a WARN_ON_ONCE in enqueue_hrtimer().
> 
> Aside of that, the bulk of the per CPU state is not reset either, which
> means there are dangling pointers in the worst case.
> 
> Address this by adding a corresponding startup() callback, which resets the
> stale per CPU state and sets the online flag.
> 
> [ tglx: Make the new callback unconditionally available, remove the online
>   	modification in the prepare() callback and clear the remaining
>   	state in the starting callback instead of the prepare callback ]
> 
> Fixes: 5c0930ccaad5 ("hrtimers: Push pending hrtimers away from outgoing CPU earlier")
> Signed-off-by: Koichiro Den <koichiro.den@xxxxxxxxxxxxx>
> Signed-off-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> Cc: stable@xxxxxxxxxxxxxxx
> Link: https://lore.kernel.org/all/20241220134421.3809834-1-koichiro.den@xxxxxxxxxxxxx
> ---
>  include/linux/hrtimer.h |  1 +
>  kernel/cpu.c            |  2 +-
>  kernel/time/hrtimer.c   | 10 +++++++++-
>  3 files changed, 11 insertions(+), 2 deletions(-)

This causes the below. Lemme zap it.

[    0.324444] ------------[ cut here ]------------
[    0.325783] WARNING: CPU: 0 PID: 0 at kernel/time/hrtimer.c:1076 enqueue_hrtimer+0x77/0x80
[    0.325783] Modules linked in:
[    0.325783] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.13.0-rc7+ #1
[    0.325783] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 2023.11-8 02/21/2024
[    0.325783] RIP: 0010:enqueue_hrtimer+0x77/0x80
[    0.325783] Code: cb 7e 48 8b 05 42 8c b5 01 48 85 c0 74 0c 48 8b 78 08 48 89 ee e8 39 c1 ff ff 65 ff 0d 9a 42 cb 7e 75 9f 0f 1f 44 00 00 eb 98 <0f> 0b eb 9d 0f 1f 44 00 00 0f 1f 44 00 00 41 57 41 56 49 89 ce 41
[    0.325783] RSP: 0000:ffffffff82603c28 EFLAGS: 00010046
[    0.325783] RAX: ff11000074225e80 RBX: ff11000074225ec0 RCX: 00000000448b22fc
[    0.325783] RDX: 0000000000000008 RSI: ff11000074225ec0 RDI: ff110000742349c0
[    0.325783] RBP: ff110000742349c0 R08: 000000390fc0a600 R09: fffffffffffffffe
[    0.325783] R10: 0000000000000000 R11: 0000000000000000 R12: ff11000074225e80
[    0.325783] R13: ff11000074225ec0 R14: ff11000074225ec0 R15: 0000000000000040
[    0.325783] FS:  0000000000000000(0000) GS:ff11000074200000(0000) knlGS:0000000000000000
[    0.325783] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.325783] CR2: ff11000004601000 CR3: 0000000002e22001 CR4: 0000000000771ef0
[    0.325783] PKRU: 55555554
[    0.325783] Call Trace:
[    0.325783]  <TASK>
[    0.325783]  ? __warn+0x89/0x130
[    0.325783]  ? enqueue_hrtimer+0x77/0x80
[    0.325783]  ? report_bug+0x164/0x190
[    0.325783]  ? handle_bug+0x58/0x90
[    0.325783]  ? exc_invalid_op+0x17/0x70
[    0.325783]  ? asm_exc_invalid_op+0x1a/0x20
[    0.325783]  ? enqueue_hrtimer+0x77/0x80
[    0.325783]  hrtimer_start_range_ns+0x258/0x370
[    0.325783]  start_dl_timer+0xa6/0x130
[    0.325783]  enqueue_dl_entity+0x43d/0xa60
[    0.325783]  dl_server_start+0x42/0x190
[    0.325783]  enqueue_task_fair+0x22f/0x5e0
[    0.325783]  enqueue_task+0x32/0x150
[    0.325783]  ? post_init_entity_util_avg+0x29/0x160
[    0.325783]  wake_up_new_task+0x19b/0x300
[    0.325783]  kernel_clone+0x126/0x430
[    0.325783]  user_mode_thread+0x5f/0x90
[    0.325783]  ? rest_init+0xd0/0xd0
[    0.325783]  rest_init+0x1e/0xd0
[    0.325783]  start_kernel+0x5fc/0x870
[    0.325783]  x86_64_start_reservations+0x18/0x30
[    0.325783]  x86_64_start_kernel+0x96/0xa0
[    0.325783]  common_startup_64+0x13e/0x141
[    0.325783]  </TASK>
[    0.325783] ---[ end trace 0000000000000000 ]---

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette