Re: [RFC PATCH 2/4] mm: kpromoted: Hot page info collection and promotion daemon

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 14-Mar-25 2:06 AM, Davidlohr Bueso wrote:
On Thu, 06 Mar 2025, Bharata B Rao wrote:

+/*
+ * Go thro' page hotness information and migrate pages if required.
+ *
+ * Promoted pages are not longer tracked in the hot list.
+ * Cold pages are pruned from the list as well.
+ *
+ * TODO: Batching could be done
+ */
+static void kpromoted_migrate(pg_data_t *pgdat)
+{
+    int nid = pgdat->node_id;
+    struct page_hotness_info *phi;
+    struct hlist_node *tmp;
+    int nr_bkts = HASH_SIZE(page_hotness_hash);
+    int bkt;
+
+    for (bkt = 0; bkt < nr_bkts; bkt++) {
+        mutex_lock(&page_hotness_lock[bkt]);
+        hlist_for_each_entry_safe(phi, tmp, &page_hotness_hash[bkt], hnode) {
+            if (phi->hot_node != nid)
+                continue;
+
+            if (page_should_be_promoted(phi)) {
+                count_vm_event(KPROMOTED_MIG_CANDIDATE);
+                if (!kpromote_page(phi)) {
+                    count_vm_event(KPROMOTED_MIG_PROMOTED);
+                    hlist_del_init(&phi->hnode);
+                    kfree(phi);
+                }
+            } else {
+                /*
+                 * Not a suitable page or cold page, stop tracking it.
+                 * TODO: Identify cold pages and drive demotion?
+                 */

I don't think kpromoted should drive demotion at all. No one is complaining about migrate in lieu of discard, and there is also proactive reclaim which users can trigger. All the in-kernel problems are wrt promotion. The simpler any of these kthreads are the better.

I was testing on default kernel with NUMA balancing mode 2.

The multi-threaded application allocates memory on DRAM and the allocation spills over to CXL node. The threads keep accessing allocated memory pages in random order.

pgpromote_success 6
pgpromote_candidate 745387
pgdemote_kswapd 51085
pgdemote_direct 10481
pgdemote_khugepaged 0
numa_pte_updates 27249625
numa_huge_pte_updates 0
numa_hint_faults 9660745
numa_hint_faults_local 0
numa_pages_migrated 6
numa_node_full 745438
pgmigrate_success 2225458
pgmigrate_fail 1187349

I hardly see any promotion happening.

In order to check the number of times the toptier node was found to be full when attempting to promote, I added numa_node_full counter like below:

diff --git a/mm/migrate.c b/mm/migrate.c
index fb19a18892c8..4d049d896589 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -2673,6 +2673,7 @@ int migrate_misplaced_folio_prepare(struct folio *folio,
        if (!migrate_balanced_pgdat(pgdat, nr_pages)) {
                int z;

+               count_vm_event(NUMA_NODE_FULL);
if (!(sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING))
                        return -EAGAIN;
                for (z = pgdat->nr_zones - 1; z >= 0; z--) {


As seen above, numa_node_full 745438. This matches pgpromote_candidate numbers.

I do see counters reporting kswapd-driven and direct demotion as well but does this mean that demotion isn't happening fast enough to cope up with promotion requirement in this high toptier memory pressure situation?

Regards,
Bharata.




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux