Hello, Sorry about the delay. I was wondering what happened to this thread earlier in the morning and realized that I missed your reply. Thanks for the dump and instruction. I was only testing on NUMA setups and made a silly mistake. Can you please try the following patch? ------ 8< ------ workqueue: The default node_nr_active should have its max set to max_active The default nna (node_nr_active) is used when the pool isn't tied to a specific NUMA node. This can happen in the following cases: 1. On NUMA, if per-node pwq init failure and the fallback pwq is used. 2. On NUMA, if a pool is configured to span multiple nodes. 3. On single node setups. 5797b1c18919 ("workqueue: Implement system-wide nr_active enforcement for unbound workqueues") set the default nna->max to min_active because only #1 was being considered. For #2 and #3, using min_active means that the max concurrency in normal operation is pushed down to min_active which is currently 8, which can obviously lead to performance issues. #1 is very unlikely to happen to begin with and even when it does which exact value nna->max is set to doesn't really matter. #2 can only happen if the workqueue is intentionally configured to ignore NUMA boundaries and there's no good way to distribute max_active in this case. #3 is the default behavior on single node machines. Let's set it the default nna->max to max_active. This fixes the artificially lowered concurrency problem on single node machines and shouldn't hurt anything for other cases. Signed-off-by: Tejun Heo <tj@xxxxxxxxxx> Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@xxxxxxx> Fixes: 5797b1c18919 ("workqueue: Implement system-wide nr_active enforcement for unbound workqueues") Link: http://lkml.kernel.org/r/20240410082822.2131994-1-shinichiro.kawasaki@xxxxxxx --- kernel/workqueue.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 0066c8f6c154..f94ae51c6f2b 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -1606,7 +1606,7 @@ static void wq_update_node_max_active(struct workqueue_struct *wq, int off_cpu) min_active, max_active); } - wq_node_nr_active(wq, NUMA_NO_NODE)->max = min_active; + wq_node_nr_active(wq, NUMA_NO_NODE)->max = max_active; } /**