Re: [RFC PATCH 2/4] mm: kpromoted: Hot page info collection and promotion daemon

Gregory Price <gourry@xxxxxxxxxx> · Mon, 17 Mar 2025 11:05:08 -0400

On Mon, Mar 17, 2025 at 09:09:18AM +0530, Bharata B Rao wrote:
> On 13-Mar-25 10:14 PM, Davidlohr Bueso wrote:
> > On Thu, 06 Mar 2025, Bharata B Rao wrote:
> > 
> > > +static int page_should_be_promoted(struct page_hotness_info *phi)
> > > +{
> > > +    struct page *page = pfn_to_online_page(phi->pfn);
> > > +    unsigned long now = jiffies;
> > > +    struct folio *folio;
> > > +
> > > +    if (!page || is_zone_device_page(page))
> > > +        return false;
> > > +
> > > +    folio = page_folio(page);
> > > +    if (!folio_test_lru(folio)) {
> > > +        count_vm_event(KPROMOTED_MIG_NON_LRU);
> > > +        return false;
> > > +    }
> > > +    if (folio_nid(folio) == phi->hot_node) {
> > > +        count_vm_event(KPROMOTED_MIG_RIGHT_NODE);
> > > +        return false;
> > > +    }
> > 
> > How about using the LRU age itself:
> 
> Sounds like a good check for page hotness.
> 
> > 
> > if (folio_test_active())
> >     return true;
> 
> But the numbers I obtained with this check added, didn't really hit this
> condition all that much. I was running a multi-threaded application that
> allocates enough memory such that the allocation spills over from DRAM node
> to the CXL node. Threads keep touching the memory pages in random order.
> 

Is demotion enabled by any chance?

i.e. are you sure it's actually allocating from CXL and not demoting
cold stuff to CXL?

> kpromoted_recorded_accesses 960620 /* Number of recorded accesses */
> kpromoted_recorded_hwhints 960620  /* Nr accesses via HW hints, IBS in this
> case */
> kpromoted_recorded_pgtscans 0
> kpromoted_record_toptier 638006 /* Nr toptier accesses */
> kpromoted_record_added 321234 /* Nr (CXL) accesses that are tracked */
> kpromoted_record_exists 1380
> kpromoted_mig_right_node 0
> kpromoted_mig_non_lru 226
> kpromoted_mig_lru_active 47 /* Number of accesses considered for promotion
> as determined by folio_test_active() check */
> kpromoted_mig_cold_old 0
> kpromoted_mig_cold_not_accessed 1373
> kpromoted_mig_candidate 319635
> kpromoted_mig_promoted 319635
> kpromoted_mig_dropped 1599
> 
> Need to check why is this the case.
> 
> > 
> > > +
> > > +    /* If the page was hot a while ago, don't promote */
> > > +    if ((now - phi->last_update) > 2 *
> > > msecs_to_jiffies(KPROMOTED_FREQ_WINDOW)) {
> > > +        count_vm_event(KPROMOTED_MIG_COLD_OLD);
> > > +        return false;
> > > +    }
> > > +
> > > +    /* If the page hasn't been accessed enough number of times,
> > > don't promote */
> > > +    if (phi->frequency < KPRMOTED_FREQ_THRESHOLD) {
> > > +        count_vm_event(KPROMOTED_MIG_COLD_NOT_ACCESSED);
> > > +        return false;
> > > +    }
> > > +    return true;
> > > +}
> > 
> > ...
> > 
> > > +static int kpromoted(void *p)
> > > +{
> > > +    pg_data_t *pgdat = (pg_data_t *)p;
> > > +    struct task_struct *tsk = current;
> > > +    long timeout = msecs_to_jiffies(KPROMOTE_DELAY);
> > > +
> > > +    const struct cpumask *cpumask = cpumask_of_node(pgdat->node_id);
> > > +
> > > +    if (!cpumask_empty(cpumask))
> > > +        set_cpus_allowed_ptr(tsk, cpumask);
> > 
> > Explicit cpumasks are not needed if you use kthread_create_on_node().
> 
> Thanks, will incorporate.
> 
> Regards,
> Bharata.