Hi Huang, >+void set_numabalancing_state(bool enabled) >+{ >+ if (enabled) >+ sysctl_numa_balancing_mode = NUMA_BALANCING_NORMAL; >+ else >+ sysctl_numa_balancing_mode = NUMA_BALANCING_DISABLED; >+ __set_numabalancing_state(enabled); >+} >+ One of the properties of optimized NUMA Balancing for tiered memory is we are not going to scan top-tier nodes as promotion doesn't make sense there (implemented in the next patch [3/6]). However, if a system has only single memory node with CPU, does it make sense to run `NUMA_BALANCING_NORMAL` mode there? What do you think about downgrading to `NUMA_BALANCING_MEMORY_TIERING` mode if a user setup NUMA Balancing on the default mode of `NUMA_BALANCING_NORMAL` on a single toptier memory node? >diff --git a/mm/vmscan.c b/mm/vmscan.c >index c266e64d2f7e..5edb5dfa8900 100644 >--- a/mm/vmscan.c >+++ b/mm/vmscan.c >@@ -56,6 +56,7 @@ > > #include <linux/swapops.h> > #include <linux/balloon_compaction.h> >+#include <linux/sched/sysctl.h> > > #include "internal.h" > >@@ -3919,6 +3920,12 @@ static bool pgdat_watermark_boosted(pg_data_t *pgdat, int highest_zoneidx) > return false; > } > >+/* >+ * Keep the free pages on fast memory node a little more than the high >+ * watermark to accommodate the promoted pages. >+ */ >+#define NUMA_BALANCING_PROMOTE_WATERMARK (10UL * 1024 * 1024 >> PAGE_SHIFT) >+ > /* > * Returns true if there is an eligible zone balanced for the request order > * and highest_zoneidx >@@ -3940,6 +3947,15 @@ static bool pgdat_balanced(pg_data_t *pgdat, int order, int highest_zoneidx) > continue; > > mark = high_wmark_pages(zone); >+ if (sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING && >+ numa_demotion_enabled && >+ next_demotion_node(pgdat->node_id) != NUMA_NO_NODE) { >+ unsigned long promote_mark; >+ >+ promote_mark = min(NUMA_BALANCING_PROMOTE_WATERMARK, >+ pgdat->node_present_pages >> 6); >+ mark += promote_mark; >+ } > if (zone_watermark_ok_safe(zone, order, mark, highest_zoneidx)) > return true; > } This can be moved to a different patch. I think, this patch [2/6] can be splitted into two basic patches -- 1. NUMA Balancing interface for tiered memory and 2. maintaining a headroom for promotion. Instead of having a static value for `NUMA_BALANCING_PROMOTE_WATERMARK` what about decoupling the allocation and reclamation and add a user-space interface for controling them? Do you think patch [2/5] and [3/5] of this series can be merged to your current patchset? https://lore.kernel.org/all/cover.1637778851.git.hasanalmaruf@xxxxxx/ Best, Hasan