* Mel Gorman <mgorman@xxxxxxx> [2013-07-15 16:20:10]: > A preferred node is selected based on the node the most NUMA hinting > faults was incurred on. There is no guarantee that the task is running > on that node at the time so this patch rescheules the task to run on > the most idle CPU of the selected node when selected. This avoids > waiting for the balancer to make a decision. > > Signed-off-by: Mel Gorman <mgorman@xxxxxxx> > --- > kernel/sched/core.c | 17 +++++++++++++++++ > kernel/sched/fair.c | 46 +++++++++++++++++++++++++++++++++++++++++++++- > kernel/sched/sched.h | 1 + > 3 files changed, 63 insertions(+), 1 deletion(-) > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index 5e02507..b67a102 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -4856,6 +4856,23 @@ fail: > return ret; > } > > +#ifdef CONFIG_NUMA_BALANCING > +/* Migrate current task p to target_cpu */ > +int migrate_task_to(struct task_struct *p, int target_cpu) > +{ > + struct migration_arg arg = { p, target_cpu }; > + int curr_cpu = task_cpu(p); > + > + if (curr_cpu == target_cpu) > + return 0; > + > + if (!cpumask_test_cpu(target_cpu, tsk_cpus_allowed(p))) > + return -EINVAL; > + > + return stop_one_cpu(curr_cpu, migration_cpu_stop, &arg); As I had noted earlier, this upsets schedstats badly. Can we add a TODO for this patch, which mentions that schedstats need to taken care. One alternative that I can think of is to have a per scheduling class routine that gets called and does the needful. for example: for fair share, it could update the schedstats as well as check for cfs_throttling. But I think its an issue that needs some fix or we should obsolete schedstats. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>