On Mon, Mar 17, 2025 at 09:09:18AM +0530, Bharata B Rao wrote: > On 13-Mar-25 10:14 PM, Davidlohr Bueso wrote: > > On Thu, 06 Mar 2025, Bharata B Rao wrote: > > > > > +static int page_should_be_promoted(struct page_hotness_info *phi) > > > +{ > > > + struct page *page = pfn_to_online_page(phi->pfn); > > > + unsigned long now = jiffies; > > > + struct folio *folio; > > > + > > > + if (!page || is_zone_device_page(page)) > > > + return false; > > > + > > > + folio = page_folio(page); > > > + if (!folio_test_lru(folio)) { > > > + count_vm_event(KPROMOTED_MIG_NON_LRU); > > > + return false; > > > + } > > > + if (folio_nid(folio) == phi->hot_node) { > > > + count_vm_event(KPROMOTED_MIG_RIGHT_NODE); > > > + return false; > > > + } > > > > How about using the LRU age itself: > > Sounds like a good check for page hotness. > > > > > if (folio_test_active()) > > return true; > > But the numbers I obtained with this check added, didn't really hit this > condition all that much. I was running a multi-threaded application that > allocates enough memory such that the allocation spills over from DRAM node > to the CXL node. Threads keep touching the memory pages in random order. > Is demotion enabled by any chance? i.e. are you sure it's actually allocating from CXL and not demoting cold stuff to CXL? > kpromoted_recorded_accesses 960620 /* Number of recorded accesses */ > kpromoted_recorded_hwhints 960620 /* Nr accesses via HW hints, IBS in this > case */ > kpromoted_recorded_pgtscans 0 > kpromoted_record_toptier 638006 /* Nr toptier accesses */ > kpromoted_record_added 321234 /* Nr (CXL) accesses that are tracked */ > kpromoted_record_exists 1380 > kpromoted_mig_right_node 0 > kpromoted_mig_non_lru 226 > kpromoted_mig_lru_active 47 /* Number of accesses considered for promotion > as determined by folio_test_active() check */ > kpromoted_mig_cold_old 0 > kpromoted_mig_cold_not_accessed 1373 > kpromoted_mig_candidate 319635 > kpromoted_mig_promoted 319635 > kpromoted_mig_dropped 1599 > > Need to check why is this the case. > > > > > > + > > > + /* If the page was hot a while ago, don't promote */ > > > + if ((now - phi->last_update) > 2 * > > > msecs_to_jiffies(KPROMOTED_FREQ_WINDOW)) { > > > + count_vm_event(KPROMOTED_MIG_COLD_OLD); > > > + return false; > > > + } > > > + > > > + /* If the page hasn't been accessed enough number of times, > > > don't promote */ > > > + if (phi->frequency < KPRMOTED_FREQ_THRESHOLD) { > > > + count_vm_event(KPROMOTED_MIG_COLD_NOT_ACCESSED); > > > + return false; > > > + } > > > + return true; > > > +} > > > > ... > > > > > +static int kpromoted(void *p) > > > +{ > > > + pg_data_t *pgdat = (pg_data_t *)p; > > > + struct task_struct *tsk = current; > > > + long timeout = msecs_to_jiffies(KPROMOTE_DELAY); > > > + > > > + const struct cpumask *cpumask = cpumask_of_node(pgdat->node_id); > > > + > > > + if (!cpumask_empty(cpumask)) > > > + set_cpus_allowed_ptr(tsk, cpumask); > > > > Explicit cpumasks are not needed if you use kthread_create_on_node(). > > Thanks, will incorporate. > > Regards, > Bharata.