This patchset moves the task_mm_cid_work to a preemptible and migratable context. This reduces the impact of this task to the scheduling latency of real time tasks. The change makes the recurrence of the task a bit more predictable. We also add optimisation and fixes to make sure the task_mm_cid_work works as intended. The behaviour causing latency was introduced in commit 223baf9d17f2 ("sched: Fix performance regression introduced by mm_cid") which introduced a task work tied to the scheduler tick. That approach presents two possible issues: * the task work runs before returning to user and causes, in fact, a scheduling latency (with order of magnitude significant in PREEMPT_RT) * periodic tasks with short runtime are less likely to run during the tick, hence they might not run the task work at all Patch 1 allows the mm_cids to be actually compacted when a process reduces its number of threads, which was not the case since the same mm_cids were reused to improve cache locality, more details in [3]. Patch 2 contains the main changes, removing the task_work on the scheduler tick and using a delayed_work instead. Additionally, we terminate the call immediately if we see that no mm_cid is actually active, which could happen on processes sleeping for long time or which exited but whose mm has not been freed yet. Patch 3 adds a selftest to validate the functionality of the task_mm_cid_work (i.e. to compact the mm_cids). The test fails if patch 1 is not applied and is flaky without patch 2. We expect it to always pass with the entire patchset applied. Changes since V3 [1]: * Fixes on the selftest * Minor style issues in comments and indentation * Use of perror where possible * Add a barrier to align threads execution * Improve test failure and error handling Changes since V2 [2]: * Change the order of the patches * Merge patches changing the main delayed_work logic * Improved self-test to spawn 1 less thread and use the main one instead Changes since V1 [3]: * Re-arm the delayed_work at each invocation * Cancel the work synchronously at mmdrop * Remove next scan fields and completely rely on the delayed_work * Shrink mm_cid allocation with nr thread/affinity (Mathieu Desnoyers) * Add self test Overhead comparison in [3] [1] - https://lore.kernel.org/linux-kernel/20241216130909.240042-1-gmonaco@xxxxxxxxxx/ [2] - https://lore.kernel.org/linux-kernel/20241213095407.271357-1-gmonaco@xxxxxxxxxx/ [3] - https://lore.kernel.org/linux-kernel/20241205083110.180134-2-gmonaco@xxxxxxxxxx/ Gabriele Monaco (2): sched: Move task_mm_cid_work to mm delayed work rseq/selftests: Add test for mm_cid compaction Mathieu Desnoyers (1): sched: Compact RSEQ concurrency IDs with reduced threads and affinity include/linux/mm_types.h | 23 ++- include/linux/sched.h | 1 - kernel/sched/core.c | 66 +------ kernel/sched/sched.h | 32 ++- tools/testing/selftests/rseq/.gitignore | 1 + tools/testing/selftests/rseq/Makefile | 2 +- .../selftests/rseq/mm_cid_compaction_test.c | 185 ++++++++++++++++++ 7 files changed, 231 insertions(+), 79 deletions(-) create mode 100644 tools/testing/selftests/rseq/mm_cid_compaction_test.c base-commit: 5bc55a333a2f7316b58edc7573e8e893f7acb532 -- 2.47.1