On Mon, Mar 01, 2010 at 02:14:39AM -0800, David Rientjes wrote: > From: Con Kolivas <kernel@xxxxxxxxxxx> > > When kswapd is awoken due to reclaim by a running task, set the priority > of kswapd to that of the task allocating pages thus making memory reclaim > cpu activity affected by nice level. > Why? When a process kicks kswapd, the watermark at which a process enters direct reclaim has not been reached yet. In other words, there is no guarantee that a process will stall due to memory pressure. The exception would be if there are many high-priority processes allocating pages at a steady rate that are starving kswapd of CPU time and consequently entering direct reclaim. In this case, the high-priority processes effectively should stall until they have reclaimed the pages. As Con is involved, I'm guessing there are high-priority interactive processes that jitter in low-memory situations but as I've never observed such a scenario I'm not sure. My main concern is that in the case there are a mix of high and low processes with kswapd towards the higher priority as a result of this patch, kswapd could be keeping CPU time from low-priority processes that are well behaved that would would make less forward progress as a result of this patch. I'm not against it as such, but I'd like to know more about the problem this solves and what the before and after behaviour looks like. > [rientjes@xxxxxxxxxx: refactor for current] > Cc: Mel Gorman <mel@xxxxxxxxx> > Signed-off-by: Con Kolivas <kernel@xxxxxxxxxxx> > Signed-off-by: David Rientjes <rientjes@xxxxxxxxxx> > --- > mm/vmscan.c | 33 ++++++++++++++++++++++++++++++++- > 1 files changed, 32 insertions(+), 1 deletions(-) > > diff --git a/mm/vmscan.c b/mm/vmscan.c > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -1658,6 +1658,33 @@ static void shrink_zone(int priority, struct zone *zone, > } > > /* > + * Helper functions to adjust nice level of kswapd, based on the priority of > + * the task allocating pages. If it is already higher priority we do not > + * demote its nice level since it is still working on behalf of a higher > + * priority task. With kernel threads we leave it at nice 0. > + * > + * We don't ever run kswapd real time, so if a real time task calls kswapd we > + * set it to highest SCHED_NORMAL priority. > + */ > +static int effective_sc_prio(struct task_struct *p) > +{ > + if (likely(p->mm)) { > + if (rt_task(p)) > + return -20; > + return task_nice(p); > + } > + return 0; > +} > + > +static void set_kswapd_nice(struct task_struct *kswapd, int active) > +{ > + long nice = effective_sc_prio(current); > + > + if (task_nice(kswapd) > nice || !active) > + set_user_nice(kswapd, nice); > +} > + > +/* > * This is the direct reclaim path, for page-allocating processes. We only > * try to reclaim pages from zones which will satisfy the caller's allocation > * request. > @@ -2257,6 +2284,7 @@ static int kswapd(void *p) > } > } > > + set_user_nice(tsk, 0); > order = pgdat->kswapd_max_order; > } > finish_wait(&pgdat->kswapd_wait, &wait); > @@ -2281,6 +2309,7 @@ static int kswapd(void *p) > void wakeup_kswapd(struct zone *zone, int order) > { > pg_data_t *pgdat; > + int active; > > if (!populated_zone(zone)) > return; > @@ -2292,7 +2321,9 @@ void wakeup_kswapd(struct zone *zone, int order) > pgdat->kswapd_max_order = order; > if (!cpuset_zone_allowed_hardwall(zone, GFP_KERNEL)) > return; > - if (!waitqueue_active(&pgdat->kswapd_wait)) > + active = waitqueue_active(&pgdat->kswapd_wait); > + set_kswapd_nice(pgdat->kswapd, active); > + if (!active) > return; > wake_up_interruptible(&pgdat->kswapd_wait); > } > -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxxx For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>