Re: [PATCH 36/46] mm: numa: Use a two-stage filter to restrict pages being migrated for unlikely task<->node relationships

Mel Gorman <mgorman@xxxxxxx> · Wed, 21 Nov 2012 19:15:47 +0000

On Wed, Nov 21, 2012 at 07:25:37PM +0100, Ingo Molnar wrote:
> 
> * Mel Gorman <mgorman@xxxxxxx> wrote:
> 
> > While it is desirable that all threads in a process run on its home
> > node, this is not always possible or necessary. There may be more
> > threads than exist within the node or the node might over-subscribed
> > with unrelated processes.
> > 
> > This can cause a situation whereby a page gets migrated off its home
> > node because the threads clearing pte_numa were running off-node. This
> > patch uses page->last_nid to build a two-stage filter before pages get
> > migrated to avoid problems with short or unlikely task<->node
> > relationships.
> > 
> > Signed-off-by: Mel Gorman <mgorman@xxxxxxx>
> > ---
> >  mm/mempolicy.c |   30 +++++++++++++++++++++++++++++-
> >  1 file changed, 29 insertions(+), 1 deletion(-)
> > 
> > diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> > index 4c1c8d8..fd20e28 100644
> > --- a/mm/mempolicy.c
> > +++ b/mm/mempolicy.c
> > @@ -2317,9 +2317,37 @@ int mpol_misplaced(struct page *page, struct vm_area_struct *vma, unsigned long
> >  	}
> >  
> >  	/* Migrate the page towards the node whose CPU is referencing it */
> > -	if (pol->flags & MPOL_F_MORON)
> > +	if (pol->flags & MPOL_F_MORON) {
> > +		int last_nid;
> > +
> >  		polnid = numa_node_id();
> >  
> > +		/*
> > +		 * Multi-stage node selection is used in conjunction
> > +		 * with a periodic migration fault to build a temporal
> > +		 * task<->page relation. By using a two-stage filter we
> > +		 * remove short/unlikely relations.
> > +		 *
> > +		 * Using P(p) ~ n_p / n_t as per frequentist
> > +		 * probability, we can equate a task's usage of a
> > +		 * particular page (n_p) per total usage of this
> > +		 * page (n_t) (in a given time-span) to a probability.
> > +		 *
> > +		 * Our periodic faults will sample this probability and
> > +		 * getting the same result twice in a row, given these
> > +		 * samples are fully independent, is then given by
> > +		 * P(n)^2, provided our sample period is sufficiently
> > +		 * short compared to the usage pattern.
> > +		 *
> > +		 * This quadric squishes small probabilities, making
> > +		 * it less likely we act on an unlikely task<->page
> > +		 * relation.
> > +		 */
> > +		last_nid = page_xchg_last_nid(page, polnid);
> > +		if (last_nid != polnid)
> > +			goto out;
> > +	}
> > +
> >  	if (curnid != polnid)
> >  		ret = polnid;
> >  out:
> 
> As mentioned in my other mail, this patch of yours looks very 
> similar to the numa/core commit attached below, mostly written 
> by Peter:
> 
>   30f93abc6cb3 sched, numa, mm: Add the scanning page fault machinery
> 

My patch is directly based on that particular patch and is a partial
extraction. I could not directly pull which is why the From is missing. I
think you'll also find that it's very similar to a partial extraction
from "autonuma: memory follows CPU algorithm and task/mm_autonuma stats
collection". The primary differences are exactly how the logic is applied
and when it happens.

I've added a note now to that effect now. For all the patches with notes
or any other ones, I'll be very happy to add the Signed-offs back on if
the original authors acknowledge they are ok with the end result. If you
recall, in the original V1 of this series I said;

	This series steals very heavily from both autonuma and schednuma
	with very little original code. In some cases I removed the
	signed-off-bys because the result was too different. I have noted
	in the changelog where this happened but the signed-offs can be
	restored if the original authors agree.

Just to compare, this is the wording in "autonuma: memory follows CPU
algorithm and task/mm_autonuma stats collection"

+/*
+ * In this function we build a temporal CPU_node<->page relation by
+ * using a two-stage autonuma_last_nid filter to remove short/unlikely
+ * relations.
+ *
+ * Using P(p) ~ n_p / n_t as per frequentest probability, we can
+ * equate a node's CPU usage of a particular page (n_p) per total
+ * usage of this page (n_t) (in a given time-span) to a probability.
+ *
+ * Our periodic faults will then sample this probability and getting
+ * the same result twice in a row, given these samples are fully
+ * independent, is then given by P(n)^2, provided our sample period
+ * is sufficiently short compared to the usage pattern.
+ *
+ * This quadric squishes small probabilities, making it less likely
+ * we act on an unlikely CPU_node<->page relation.
+ */

If this was the basis for the sched/numa patch then I'd point out that
I'm not the only person that failed to preserve history perfectly.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>