The following commit has been merged into the timers/urgent branch of tip: Commit-ID: 4b6f4c5a67c07417bf29d896c76f513a4be07516 Gitweb: https://git.kernel.org/tip/4b6f4c5a67c07417bf29d896c76f513a4be07516 Author: Frederic Weisbecker <frederic@xxxxxxxxxx> AuthorDate: Fri, 15 Mar 2024 02:14:47 +01:00 Committer: Thomas Gleixner <tglx@xxxxxxxxxxxxx> CommitterDate: Sat, 16 Mar 2024 19:55:46 +01:00 timer/migration: Remove buggy early return on deactivation When a CPU enters into idle and deactivates itself from the timer migration hierarchy without any global timer of its own to propagate, the group event of that CPU is set to "ignore" and tmigr_update_events() accordingly performs an early return without considering timers queued by other CPUs. If the hierarchy has a single level, and the CPU is the last one to enter idle, it will ignore others' global timers, as in the following layout: [GRP0:0] migrator = 0 active = 0 nextevt = T0i / \ 0 1 active (T0i) idle (T1) 0) CPU 0 is active thus its event is ignored (the letter 'i') and so are upper levels' events. CPU 1 is idle and has the timer T1 enqueued. [GRP0:0] migrator = NONE active = NONE nextevt = T0i / \ 0 1 idle (T0i) idle (T1) 1) CPU 0 goes idle without global event queued. Therefore KTIME_MAX is pushed as its next expiry and its own event kept as "ignore". As a result tmigr_update_events() ignores T1 and CPU 0 goes to idle with T1 unhandled. This isn't proper to single level hierarchy though. A similar issue, although slightly different, may arise on multi-level: [GRP1:0] migrator = GRP0:0 active = GRP0:0 nextevt = T0:0i, T0:1 / \ [GRP0:0] [GRP0:1] migrator = 0 migrator = NONE active = 0 active = NONE nextevt = T0i nextevt = T2 / \ / \ 0 (T0i) 1 (T1) 2 (T2) 3 active idle idle idle 0) CPU 0 is active thus its event is ignored (the letter 'i') and so are upper levels' events. CPU 1 is idle and has the timer T1 enqueued. CPU 2 also has a timer. The expiry order is T0 (ignored) < T1 < T2 [GRP1:0] migrator = GRP0:0 active = GRP0:0 nextevt = T0:0i, T0:1 / \ [GRP0:0] [GRP0:1] migrator = NONE migrator = NONE active = NONE active = NONE nextevt = T0i nextevt = T2 / \ / \ 0 (T0i) 1 (T1) 2 (T2) 3 idle idle idle idle 1) CPU 0 goes idle without global event queued. Therefore KTIME_MAX is pushed as its next expiry and its own event kept as "ignore". As a result tmigr_update_events() ignores T1. The change only propagated up to 1st level so far. [GRP1:0] migrator = NONE active = NONE nextevt = T0:1 / \ [GRP0:0] [GRP0:1] migrator = NONE migrator = NONE active = NONE active = NONE nextevt = T0i nextevt = T2 / \ / \ 0 (T0i) 1 (T1) 2 (T2) 3 idle idle idle idle 2) The change now propagates up to the top. tmigr_update_events() finds that the child event is ignored and thus removes it. The top level next event is now T2 which is returned to CPU 0 as its next effective expiry to take account for as the global idle migrator. However T1 has been ignored along the way, leaving it unhandled. Fix those issues with removing the buggy related early return. Ignored child events must not prevent from evaluating the other events within the same group. Reported-by: Boqun Feng <boqun.feng@xxxxxxxxx> Reported-by: Florian Fainelli <f.fainelli@xxxxxxxxx> Reported-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx> Signed-off-by: Frederic Weisbecker <frederic@xxxxxxxxxx> Signed-off-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx> Tested-by: Florian Fainelli <florian.fainelli@xxxxxxxxxxxx> Link: https://lore.kernel.org/r/ZfOhB9ZByTZcBy4u@lothringen --- kernel/time/timer_migration.c | 20 -------------------- 1 file changed, 20 deletions(-) diff --git a/kernel/time/timer_migration.c b/kernel/time/timer_migration.c index 8f49b6b..611cd90 100644 --- a/kernel/time/timer_migration.c +++ b/kernel/time/timer_migration.c @@ -751,26 +751,6 @@ bool tmigr_update_events(struct tmigr_group *group, struct tmigr_group *child, first_childevt = evt = data->evt; - /* - * Walking the hierarchy is required in any case when a - * remote expiry was done before. This ensures to not lose - * already queued events in non active groups (see section - * "Required event and timerqueue update after a remote - * expiry" in the documentation at the top). - * - * The two call sites which are executed without a remote expiry - * before, are not prevented from propagating changes through - * the hierarchy by the return: - * - When entering this path by tmigr_new_timer(), @evt->ignore - * is never set. - * - tmigr_inactive_up() takes care of the propagation by - * itself and ignores the return value. But an immediate - * return is required because nothing has to be done in this - * level as the event could be ignored. - */ - if (evt->ignore && !remote) - return true; - raw_spin_lock(&group->lock); childstate.state = 0;