> On Mon, Jul 29, 2024 at 7:29 PM Uladzislau Rezki <urezki@xxxxxxxxx> wrote: > > It would be really good if Adrian could run the "compiling workload" on > > his big system and post the statistics here. > > > > For example: > > a) v6.11-rc1 + KASAN. > > b) v6.11-rc1 + KASAN + patch. > > Sure, please see the statistics below. > > Test Result (based on 6.11-rc1) > =============================== > > 1. Profile purge_vmap_node() > > A. Command: trace-cmd record -p function_graph -l purge_vmap_node make -j $(nproc) > > B. Average execution time of purge_vmap_node(): > > no patch (us) patched (us) saved > ------------- ------------ ----- > 147885.02 3692.51 97% > > C. Total execution time of purge_vmap_node(): > > no patch (us) patched (us) saved > ------------- ------------ ----- > 194173036 5114138 97% > > [ftrace log] Without patch: https://gist.github.com/AdrianHuang/a5bec861f67434e1024bbf43cea85959 > [ftrace log] With patch: https://gist.github.com/AdrianHuang/a200215955ee377288377425dbaa04e3 > > 2. Use `time` utility to measure execution time > > A. Command: make clean && time make -j $(nproc) > > B. The following result is the average kernel execution time of five-time > measurements. ('sys' field of `time` output): > > no patch (seconds) patched (seconds) saved > ------------------ ---------------- ----- > 36932.904 31403.478 15% > > [`time` log] Without patch: https://gist.github.com/AdrianHuang/987b20fd0bd2bb616b3524aa6ee43112 > [`time` log] With patch: https://gist.github.com/AdrianHuang/da2ea4e6aa0b4dcc207b4e40b202f694 > I meant another statistics. As noted here https://lore.kernel.org/linux-mm/ZogS_04dP5LlRlXN@pc636/T/#m5d57f11d9f69aef5313f4efbe25415b3bae4c818 i came to conclusion that below place and lock: <snip> static void exit_notify(struct task_struct *tsk, int group_dead) { bool autoreap; struct task_struct *p, *n; LIST_HEAD(dead); write_lock_irq(&tasklist_lock); ... <snip> keeps IRQs disabled, so it means that the purge_vmap_node() does the progress but it can be slow. CPU_1: disables IRQs trying to grab the tasklist_lock CPU_2: Sends an IPI to CPU_1 waits until the specified callback is executed on CPU_1 Since CPU_1 has disabled IRQs, serving an IPI and completion of callback takes time until CPU_1 enables IRQs back. Could you please post lock statistics for kernel compiling use case? KASAN + patch is enough, IMO. This just to double check whether a tasklist_lock is a problem or not. Thanks! -- Uladzislau Rezki