On Wed, Nov 21, 2012 at 07:25:37PM +0100, Ingo Molnar wrote: > > * Mel Gorman <mgorman@xxxxxxx> wrote: > > > While it is desirable that all threads in a process run on its home > > node, this is not always possible or necessary. There may be more > > threads than exist within the node or the node might over-subscribed > > with unrelated processes. > > > > This can cause a situation whereby a page gets migrated off its home > > node because the threads clearing pte_numa were running off-node. This > > patch uses page->last_nid to build a two-stage filter before pages get > > migrated to avoid problems with short or unlikely task<->node > > relationships. > > > > Signed-off-by: Mel Gorman <mgorman@xxxxxxx> > > --- > > mm/mempolicy.c | 30 +++++++++++++++++++++++++++++- > > 1 file changed, 29 insertions(+), 1 deletion(-) > > > > diff --git a/mm/mempolicy.c b/mm/mempolicy.c > > index 4c1c8d8..fd20e28 100644 > > --- a/mm/mempolicy.c > > +++ b/mm/mempolicy.c > > @@ -2317,9 +2317,37 @@ int mpol_misplaced(struct page *page, struct vm_area_struct *vma, unsigned long > > } > > > > /* Migrate the page towards the node whose CPU is referencing it */ > > - if (pol->flags & MPOL_F_MORON) > > + if (pol->flags & MPOL_F_MORON) { > > + int last_nid; > > + > > polnid = numa_node_id(); > > > > + /* > > + * Multi-stage node selection is used in conjunction > > + * with a periodic migration fault to build a temporal > > + * task<->page relation. By using a two-stage filter we > > + * remove short/unlikely relations. > > + * > > + * Using P(p) ~ n_p / n_t as per frequentist > > + * probability, we can equate a task's usage of a > > + * particular page (n_p) per total usage of this > > + * page (n_t) (in a given time-span) to a probability. > > + * > > + * Our periodic faults will sample this probability and > > + * getting the same result twice in a row, given these > > + * samples are fully independent, is then given by > > + * P(n)^2, provided our sample period is sufficiently > > + * short compared to the usage pattern. > > + * > > + * This quadric squishes small probabilities, making > > + * it less likely we act on an unlikely task<->page > > + * relation. > > + */ > > + last_nid = page_xchg_last_nid(page, polnid); > > + if (last_nid != polnid) > > + goto out; > > + } > > + > > if (curnid != polnid) > > ret = polnid; > > out: > > As mentioned in my other mail, this patch of yours looks very > similar to the numa/core commit attached below, mostly written > by Peter: > > 30f93abc6cb3 sched, numa, mm: Add the scanning page fault machinery > My patch is directly based on that particular patch and is a partial extraction. I could not directly pull which is why the From is missing. I think you'll also find that it's very similar to a partial extraction from "autonuma: memory follows CPU algorithm and task/mm_autonuma stats collection". The primary differences are exactly how the logic is applied and when it happens. I've added a note now to that effect now. For all the patches with notes or any other ones, I'll be very happy to add the Signed-offs back on if the original authors acknowledge they are ok with the end result. If you recall, in the original V1 of this series I said; This series steals very heavily from both autonuma and schednuma with very little original code. In some cases I removed the signed-off-bys because the result was too different. I have noted in the changelog where this happened but the signed-offs can be restored if the original authors agree. Just to compare, this is the wording in "autonuma: memory follows CPU algorithm and task/mm_autonuma stats collection" +/* + * In this function we build a temporal CPU_node<->page relation by + * using a two-stage autonuma_last_nid filter to remove short/unlikely + * relations. + * + * Using P(p) ~ n_p / n_t as per frequentest probability, we can + * equate a node's CPU usage of a particular page (n_p) per total + * usage of this page (n_t) (in a given time-span) to a probability. + * + * Our periodic faults will then sample this probability and getting + * the same result twice in a row, given these samples are fully + * independent, is then given by P(n)^2, provided our sample period + * is sufficiently short compared to the usage pattern. + * + * This quadric squishes small probabilities, making it less likely + * we act on an unlikely CPU_node<->page relation. + */ If this was the basis for the sched/numa patch then I'd point out that I'm not the only person that failed to preserve history perfectly. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>