[ Please make sure to CC everyone for patch 0/3 next time. Added CCs. ] On 2025-02-10 08:56, Gabriele Monaco wrote:
This patchset moves the task_mm_cid_work to a preemptible and migratable context. This reduces the impact of this task to the scheduling latency of real time tasks. The change makes the recurrence of the task a bit more predictable. We also add optimisation and fixes to make sure the task_mm_cid_work works as intended. The behaviour causing latency was introduced in commit 223baf9d17f2 ("sched: Fix performance regression introduced by mm_cid") which introduced a task work tied to the scheduler tick. That approach presents two possible issues: * the task work runs before returning to user and causes, in fact, a scheduling latency (with order of magnitude significant in PREEMPT_RT) * periodic tasks with short runtime are less likely to run during the tick, hence they might not run the task work at all Patch 1 allows the mm_cids to be actually compacted when a process reduces its number of threads, which was not the case since the same mm_cids were reused to improve cache locality, more details in [4]. Patch 2 contains the main changes, removing the task_work on the scheduler tick and using a delayed_work instead. Additionally, we terminate the call immediately if we see that no mm_cid is actually active, which could happen on processes sleeping for long time or which exited but whose mm has not been freed yet. Patch 3 adds a selftest to validate the functionality of the task_mm_cid_work (i.e. to compact the mm_cids). The test fails if patch 1 is not applied and is flaky without patch 2. We expect it to always pass with the entire patchset applied.
There is a tiny nit in a comment in patch 3. Other than that the series looks good. Please re-send with this fixed and please add my reviewed by tag to patch 3. Once updated, Peter, Ingo, you have my blessing to pull this into tip. Thanks! Mathieu
Changes since V4 [1]: * Fixes on the selftest * Polished memory allocation and cleanup * Handle the test failure in main Changes since V3 [2]: * Fixes on the selftest * Minor style issues in comments and indentation * Use of perror where possible * Add a barrier to align threads execution * Improve test failure and error handling Changes since V2 [3]: * Change the order of the patches * Merge patches changing the main delayed_work logic * Improved self-test to spawn 1 less thread and use the main one instead Changes since V1 [4]: * Re-arm the delayed_work at each invocation * Cancel the work synchronously at mmdrop * Remove next scan fields and completely rely on the delayed_work * Shrink mm_cid allocation with nr thread/affinity (Mathieu Desnoyers) * Add self test Overhead comparison in [4] [1] - https://lore.kernel.org/lkml/20250113074231.61638-4-gmonaco@xxxxxxxxxx [2] - https://lore.kernel.org/lkml/20241216130909.240042-1-gmonaco@xxxxxxxxxx [3] - https://lore.kernel.org/lkml/20241213095407.271357-1-gmonaco@xxxxxxxxxx [4] - https://lore.kernel.org/lkml/20241205083110.180134-2-gmonaco@xxxxxxxxxx To: Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx> To: Peter Zijlstra <peterz@xxxxxxxxxxxxx> To: Ingo Molnar <mingo@xxxxxxxxxx> Gabriele Monaco (2): sched: Move task_mm_cid_work to mm delayed work rseq/selftests: Add test for mm_cid compaction Mathieu Desnoyers (1): sched: Compact RSEQ concurrency IDs with reduced threads and affinity include/linux/mm_types.h | 23 +- include/linux/sched.h | 1 - kernel/sched/core.c | 66 +----- kernel/sched/sched.h | 32 ++- tools/testing/selftests/rseq/.gitignore | 1 + tools/testing/selftests/rseq/Makefile | 2 +- .../selftests/rseq/mm_cid_compaction_test.c | 200 ++++++++++++++++++ 7 files changed, 246 insertions(+), 79 deletions(-) create mode 100644 tools/testing/selftests/rseq/mm_cid_compaction_test.c base-commit: a64dcfb451e254085a7daee5fe51bf22959d52d3
-- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com