[no subject]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Access time is something some measurement techniques will only
give you wrt to a measurement was in a window (potentially a long
one if you are looking for consistent hotness over minutes).

> 
> Also, the access time provided some sources may at best be
> considered approximate. This is especially true for hot pages
> detected by PTE A bit scanning.
> 
> kpromoted currently maintains the hot PFN records in hash lists
> hashed by PFN value. Each record stores the following info:
> 
> struct page_hotness_info {
> 	unsigned long pfn;
> 
> 	/* Time when this record was updated last */
> 	unsigned long last_update;
> 
> 	/*
> 	 * Number of times this page was accessed in the
> 	 * current window
I'd express here how that window was defined (I read on
to answer the question I had here at first!)

> 	 */
> 	int frequency;
> 
> 	/* Most recent access time */
> 	unsigned long recency;

Put next to the last_update so all the times are together

> 
> 	/* Most recent access from this node */
> 	int hot_node;

Probably want to relax the most recent part.  I'd guess
the ideal here would be if this is the node accessing it the most
'recently'.

> 
> 	struct hlist_node hnode;
> };
> 
> The way in which a page is categorized as hot enough to be
> promoted is pretty primitive now.

That bit is very hard even if we solve everything else and heavily dependent
on workload access pattern stability and migration impact.  Maybe for
'very hot' pages a fairly short consistency of hotness period is
good enough, but it gets much messier if we care about warm pages.
I guess we solve the 'very hot' first though and maybe avoid the phase
transition from an application starting to when it is at steady state
by considering a wait time for any new userspace process before we
consider moving anything?

Also worth noting that the mechanism that makes sense to check if a
detected hot page is 'stable hot' might use entirely different tracking
approach to that used to find it as a candidate.

Whether that requires passing data between hotness trackers is an
interesting question, or whether there is a natural ordering to trackers.



> diff --git a/mm/kpromoted.c b/mm/kpromoted.c
> new file mode 100644
> index 000000000000..2a8b8495b6b3
> --- /dev/null
> +++ b/mm/kpromoted.c

> +static int page_should_be_promoted(struct page_hotness_info *phi)
> +{
> +	struct page *page = pfn_to_online_page(phi->pfn);
> +	unsigned long now = jiffies;
> +	struct folio *folio;
> +
> +	if (!page || is_zone_device_page(page))
> +		return false;
> +
> +	folio = page_folio(page);
> +	if (!folio_test_lru(folio)) {
> +		count_vm_event(KPROMOTED_MIG_NON_LRU);
> +		return false;
> +	}
> +	if (folio_nid(folio) == phi->hot_node) {
> +		count_vm_event(KPROMOTED_MIG_RIGHT_NODE);
> +		return false;
> +	}
> +
> +	/* If the page was hot a while ago, don't promote */

	/* If the known record of hotness is old, don't promote */ ?

Otherwise this says don't move a page just because it was hot a long time
back. Maybe it is still hot and we just don't have an update yet?

> +	if ((now - phi->last_update) > 2 * msecs_to_jiffies(KPROMOTED_FREQ_WINDOW)) {
> +		count_vm_event(KPROMOTED_MIG_COLD_OLD);
> +		return false;
> +	}
> +
> +	/* If the page hasn't been accessed enough number of times, don't promote */
> +	if (phi->frequency < KPRMOTED_FREQ_THRESHOLD) {
> +		count_vm_event(KPROMOTED_MIG_COLD_NOT_ACCESSED);
> +		return false;
> +	}
> +	return true;
> +}
> +
> +/*
> + * Go thro' page hotness information and migrate pages if required.
> + *
> + * Promoted pages are not longer tracked in the hot list.
> + * Cold pages are pruned from the list as well.

When we say cold here why did we ever see them?

> + *
> + * TODO: Batching could be done
> + */
> +static void kpromoted_migrate(pg_data_t *pgdat)
> +{
> +	int nid = pgdat->node_id;
> +	struct page_hotness_info *phi;
> +	struct hlist_node *tmp;
> +	int nr_bkts = HASH_SIZE(page_hotness_hash);
> +	int bkt;
> +
> +	for (bkt = 0; bkt < nr_bkts; bkt++) {
> +		mutex_lock(&page_hotness_lock[bkt]);
> +		hlist_for_each_entry_safe(phi, tmp, &page_hotness_hash[bkt], hnode) {
> +			if (phi->hot_node != nid)
> +				continue;
> +
> +			if (page_should_be_promoted(phi)) {
> +				count_vm_event(KPROMOTED_MIG_CANDIDATE);
> +				if (!kpromote_page(phi)) {
> +					count_vm_event(KPROMOTED_MIG_PROMOTED);
> +					hlist_del_init(&phi->hnode);
> +					kfree(phi);
> +				}
> +			} else {
> +				/*
> +				 * Not a suitable page or cold page, stop tracking it.
> +				 * TODO: Identify cold pages and drive demotion?

Coldness tracking is really different from hotness as we need to track what we
didn't see to get the really cold pages. Maybe there is some hint to be had
from the exit of this tracker but I'd definitely not try to tackle both ends
with one approach!

> +				 */
> +				count_vm_event(KPROMOTED_MIG_DROPPED);
> +				hlist_del_init(&phi->hnode);
> +				kfree(phi);
> +			}
> +		}
> +		mutex_unlock(&page_hotness_lock[bkt]);
> +	}
> +}


> +/*
> + * Called by subsystems that generate page hotness/access information.
> + *
> + * Records the memory access info for futher action by kpromoted.
> + */
> +int kpromoted_record_access(u64 pfn, int nid, int src, unsigned long now)
> +{

> +	bkt = hash_min(pfn, KPROMOTED_HASH_ORDER);
> +	mutex_lock(&page_hotness_lock[bkt]);
> +	phi = kpromoted_lookup(pfn, bkt, now);
> +	if (!phi) {
> +		ret = PTR_ERR(phi);
> +		goto out;
> +	}
> +
> +	if ((phi->last_update - now) > msecs_to_jiffies(KPROMOTED_FREQ_WINDOW)) {
> +		/* New window */
> +		phi->frequency = 1; /* TODO: Factor in the history */
> +		phi->last_update = now;
> +	} else {
> +		phi->frequency++;
> +	}
> +	phi->recency = now;
> +
> +	/*
> +	 * TODOs:
> +	 * 1. Source nid is hard-coded for some temperature sources

Hard coded rather than unknown? I'm curious, what source has that issue?

> +	 * 2. Take action if hot_node changes - may be a shared page?
> +	 * 3. Maintain node info for every access within the window?

I guess some sort of saturating counter set might not be too bad.

> +	 */
> +	phi->hot_node = (nid == NUMA_NO_NODE) ? 1 : nid;
> +	mutex_unlock(&page_hotness_lock[bkt]);
> +out:
> +	return 0;

why store ret and not return it?

> +}
> +






[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux