Re: Deadlock possibly caused by too_many_isolated.

KOSAKI Motohiro <kosaki.motohiro@xxxxxxxxxxxxxx> · Mon, 18 Oct 2010 14:04:57 +0900 (JST)

> On Wed, 15 Sep 2010 18:44:34 +1000
> Neil Brown <neilb@xxxxxxx> wrote:
> 
> > On Wed, 15 Sep 2010 16:28:43 +0800
> > Wu Fengguang <fengguang.wu@xxxxxxxxx> wrote:
> > 
> > > Neil,
> > > 
> > > Sorry for the rushed and imaginary ideas this morning..
> > > 
> > > > @@ -1101,6 +1101,12 @@ static unsigned long shrink_inactive_lis
> > > >  	int lumpy_reclaim = 0;
> > > >  
> > > >  	while (unlikely(too_many_isolated(zone, file, sc))) {
> > > > +		if ((sc->gfp_mask & GFP_IOFS) != GFP_IOFS)
> > > > +			/* Not allowed to do IO, so mustn't wait
> > > > +			 * on processes that might try to
> > > > +			 */
> > > > +			return SWAP_CLUSTER_MAX;
> > > > +
> > > 
> > > The above patch should behavior like this: it returns SWAP_CLUSTER_MAX
> > > to cheat all the way up to believe "enough pages have been reclaimed".
> > > So __alloc_pages_direct_reclaim() see non-zero *did_some_progress and
> > > go on to call get_page_from_freelist(). That normally fails because
> > > the task didn't really scanned the LRU lists. However it does have the
> > > possibility to succeed -- when so many processes are doing concurrent
> > > direct reclaims, it may luckily get one free page reclaimed by other
> > > tasks. What's more, if it does fail to get a free page, the upper
> > > layer __alloc_pages_slowpath() will be repeat recalling
> > > __alloc_pages_direct_reclaim(). So, sooner or later it will succeed in
> > > "stealing" a free page reclaimed by other tasks.
> > > 
> > > In summary, the patch behavior for !__GFP_IO/FS is
> > > - won't do any page reclaim
> > > - won't fail the page allocation (unexpected)
> > > - will wait and steal one free page from others (unreasonable)
> > > 
> > > So it will address the problem you encountered, however it sounds
> > > pretty unexpected and illogical behavior, right?
> > > 
> > > I believe this patch will address the problem equally well.
> > > What do you think?
> > 
> > Thank you for the detailed explanation.  Is agree with your reasoning and
> > now understand why your patch is sufficient.
> > 
> > I will get it tested and let you know how that goes.
> 
> (sorry this has taken a month to follow up).
> 
> Testing shows that this patch seems to work.
> The test load (essentially kernbench) doesn't deadlock any more, though it
> does get bogged down thrashing in swap so it doesn't make a lot more
> progress :-)  I guess that is to be expected.
> 
> One observation is that the kernbench generated 10%-20% more context switches
> with the patch than without.  Is that to be expected?
> 
> Do you have plans for sending this patch upstream?

Wow, I had thought this patch has been merged already. Wu, can you please
repost this one? and please add my and Neil's ack tag.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxxx  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>