On Mon, 16 May 2011 16:06:57 +0100 Mel Gorman <mgorman@xxxxxxx> wrote: > Under constant allocation pressure, kswapd can be in the situation where > sleeping_prematurely() will always return true even if kswapd has been > running a long time. Check if kswapd needs to be scheduled. > > Signed-off-by: Mel Gorman <mgorman@xxxxxxx> > Acked-by: Rik van Riel <riel@xxxxxxxxxx> > --- > mm/vmscan.c | 4 ++++ > 1 files changed, 4 insertions(+), 0 deletions(-) > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index af24d1e..4d24828 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -2251,6 +2251,10 @@ static bool sleeping_prematurely(pg_data_t *pgdat, int order, long remaining, > unsigned long balanced = 0; > bool all_zones_ok = true; > > + /* If kswapd has been running too long, just sleep */ > + if (need_resched()) > + return false; > + > /* If a direct reclaimer woke kswapd within HZ/10, it's premature */ > if (remaining) > return true; I'm a bit worried by this one. Do we really fully understand why kswapd is continuously running like this? The changelog makes me think "no" ;) Given that the page-allocating process is madly reclaiming pages in direct reclaim (yes?) and that kswapd is madly reclaiming pages on a different CPU, we should pretty promptly get into a situation where kswapd can suspend itself. But that obviously isn't happening. So what *is* going on? Secondly, taking an up-to-100ms sleep in response to a need_resched() seems pretty savage and I suspect it risks undesirable side-effects. A plain old cond_resched() would be more cautious. But presumably kswapd() is already running cond_resched() pretty frequently, so why didn't that work? -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html