Re: [PATCH 3/3] mm: slub: Default slub_max_order to 0

James Bottomley <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> · Fri, 13 May 2011 09:16:50 -0500

On Fri, 2011-05-13 at 11:55 +0100, Mel Gorman wrote:
> On Thu, May 12, 2011 at 07:47:05PM -0500, James Bottomley wrote:
> > On Fri, 2011-05-13 at 00:15 +0200, Johannes Weiner wrote:
> > > On Thu, May 12, 2011 at 05:04:41PM -0500, James Bottomley wrote:
> > > > On Thu, 2011-05-12 at 15:04 -0500, James Bottomley wrote:
> > > > > Confirmed, I'm afraid ... I can trigger the problem with all three
> > > > > patches under PREEMPT.  It's not a hang this time, it's just kswapd
> > > > > taking 100% system time on 1 CPU and it won't calm down after I unload
> > > > > the system.
> > > > 
> > > > Just on a "if you don't know what's wrong poke about and see" basis, I
> > > > sliced out all the complex logic in sleeping_prematurely() and, as far
> > > > as I can tell, it cures the problem behaviour.  I've loaded up the
> > > > system, and taken the tar load generator through three runs without
> > > > producing a spinning kswapd (this is PREEMPT).  I'll try with a
> > > > non-PREEMPT kernel shortly.
> > > > 
> > > > What this seems to say is that there's a problem with the complex logic
> > > > in sleeping_prematurely().  I'm pretty sure hacking up
> > > > sleeping_prematurely() just to dump all the calculations is the wrong
> > > > thing to do, but perhaps someone can see what the right thing is ...
> > > 
> > > I think I see the problem: the boolean logic of sleeping_prematurely()
> > > is odd.  If it returns true, kswapd will keep running.  So if
> > > pgdat_balanced() returns true, kswapd should go to sleep.
> > > 
> > > This?
> > 
> > I was going to say this was a winner, but on the third untar run on
> > non-PREEMPT, I hit the kswapd livelock.  It's got much farther than
> > previous attempts, which all hang on the first run, but I think the
> > essential problem is still (at least on this machine) that
> > sleeping_prematurely() is doing too much work for the wakeup storm that
> > allocators are causing.
> > 
> > Something that ratelimits the amount of time we spend in the watermark
> > calculations, like the below (which incorporates your pgdat fix) seems
> > to be much more stable (I've not run it for three full runs yet, but
> > kswapd CPU time is way lower so far).
> > 
> > The heuristic here is that if we're making the calculation more than ten
> > times in 1/10 of a second, stop and sleep anyway.
> > 
> 
> Is that heuristic not basically the same as this?
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index af24d1e..4d24828 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2251,6 +2251,10 @@ static bool sleeping_prematurely(pg_data_t *pgdat, int order, long remaining,
>  	unsigned long balanced = 0;
>  	bool all_zones_ok = true;
>  
> +	/* If kswapd has been running too long, just sleep */
> +	if (need_resched())
> +		return false;

Not exactly.  That should cure the problem (and I'll test it out).
However, the traces show most of the work is being caused by
sleeping_prematurely().  The object of my patch was actually to cut that
off.  just doing a check on need_resched will still allow us to run
around that loop for hundreds of milliseconds and contribute to needless
CPU time burn of kswapd; that's why I used a number of iterations and
time cutoff in my patch.  If we've run around the loop 10 times tightly
returning true (i.e. we can't sleep and need to rebalance) each time but
the shrinkers still haven't done enough, it's time to call it quits and
sleep anyway.

James

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html