Hi Huang, On 25/02/22 08:02, Huang, Ying wrote:
We have run into a memory hotplug regression before. Let's check whether the problem is similar. Can you try the below debug patch? Best Regards, Huang, Ying ----------------------------8<------------------------------------------ From 500c0b53436b7a697ed5d77241abbc0d5d3cfc07 Mon Sep 17 00:00:00 2001 From: Huang Ying <ying.huang@xxxxxxxxx> Date: Wed, 29 Sep 2021 10:57:19 +0800 Subject: [PATCH] mm/migrate: Debug CPU hotplug regression Signed-off-by: "Huang, Ying" <ying.huang@xxxxxxxxx> --- mm/migrate.c | 20 +++++++++++--------- 1 file changed, 11 insertions(+), 9 deletions(-) diff --git a/mm/migrate.c b/mm/migrate.c index c7da064b4781..c4805f15e616 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -3261,15 +3261,17 @@ static int __meminit migrate_on_reclaim_callback(struct notifier_block *self, * The ordering is also currently dependent on which nodes have * CPUs. That means we need CPU on/offline notification too. */ -static int migration_online_cpu(unsigned int cpu) +static int migration_cpu_hotplug(unsigned int cpu) { - set_migration_target_nodes(); - return 0; -} + static int nr_cpu_node_saved; + int nr_cpu_node; + + nr_cpu_node = num_node_state(N_CPU); + if (nr_cpu_node != nr_cpu_node_saved) { + set_migration_target_nodes(); + nr_cpu_node_saved = nr_cpu_node; + } -static int migration_offline_cpu(unsigned int cpu) -{ - set_migration_target_nodes(); return 0; } @@ -3283,7 +3285,7 @@ static int __init migrate_on_reclaim_init(void) WARN_ON(!node_demotion); ret = cpuhp_setup_state_nocalls(CPUHP_MM_DEMOTION_DEAD, "mm/demotion:offline", - NULL, migration_offline_cpu); + NULL, migration_cpu_hotplug); /* * In the unlikely case that this fails, the automatic * migration targets may become suboptimal for nodes @@ -3292,7 +3294,7 @@ static int __init migrate_on_reclaim_init(void) */ WARN_ON(ret < 0); ret = cpuhp_setup_state(CPUHP_AP_MM_DEMOTION_ONLINE, "mm/demotion:online", - migration_online_cpu, NULL); + migration_cpu_hotplug, NULL); WARN_ON(ret < 0); hotplug_memory_notifier(migrate_on_reclaim_callback, 100);
This works. Applied this on 5.15 kernel and don't see any regression compared to 5.14 kernel. So, Have you posted this patch yet? Or any plans on inclusion of any similar patch?