On Wed, 15 Sep 2010 16:28:43 +0800 Wu Fengguang <fengguang.wu@xxxxxxxxx> wrote: > Neil, > > Sorry for the rushed and imaginary ideas this morning.. > > > @@ -1101,6 +1101,12 @@ static unsigned long shrink_inactive_lis > > int lumpy_reclaim = 0; > > > > while (unlikely(too_many_isolated(zone, file, sc))) { > > + if ((sc->gfp_mask & GFP_IOFS) != GFP_IOFS) > > + /* Not allowed to do IO, so mustn't wait > > + * on processes that might try to > > + */ > > + return SWAP_CLUSTER_MAX; > > + > > The above patch should behavior like this: it returns SWAP_CLUSTER_MAX > to cheat all the way up to believe "enough pages have been reclaimed". > So __alloc_pages_direct_reclaim() see non-zero *did_some_progress and > go on to call get_page_from_freelist(). That normally fails because > the task didn't really scanned the LRU lists. However it does have the > possibility to succeed -- when so many processes are doing concurrent > direct reclaims, it may luckily get one free page reclaimed by other > tasks. What's more, if it does fail to get a free page, the upper > layer __alloc_pages_slowpath() will be repeat recalling > __alloc_pages_direct_reclaim(). So, sooner or later it will succeed in > "stealing" a free page reclaimed by other tasks. > > In summary, the patch behavior for !__GFP_IO/FS is > - won't do any page reclaim > - won't fail the page allocation (unexpected) > - will wait and steal one free page from others (unreasonable) > > So it will address the problem you encountered, however it sounds > pretty unexpected and illogical behavior, right? > > I believe this patch will address the problem equally well. > What do you think? Thank you for the detailed explanation. Is agree with your reasoning and now understand why your patch is sufficient. I will get it tested and let you know how that goes. Thanks, NeilBrown > > Thanks, > Fengguang > --- > > mm: Avoid possible deadlock caused by too_many_isolated() > > Neil finds that if too_many_isolated() returns true while performing > direct reclaim we can end up waiting for other threads to complete their > direct reclaim. If those threads are allowed to enter the FS or IO to > free memory, but this thread is not, then it is possible that those > threads will be waiting on this thread and so we get a circular > deadlock. > > some task enters direct reclaim with GFP_KERNEL > => too_many_isolated() false > => vmscan and run into dirty pages > => pageout() > => take some FS lock > => fs/block code does GFP_NOIO allocation > => enter direct reclaim again > => too_many_isolated() true > => waiting for others to progress, however the other > tasks may be circular waiting for the FS lock.. > > The fix is to let !__GFP_IO and !__GFP_FS direct reclaims enjoy higher > priority than normal ones, by honouring them higher throttle threshold. > > Now !__GFP_IO/FS reclaims won't be waiting for __GFP_IO/FS reclaims to > progress. They will be blocked only when there are too many concurrent > !__GFP_IO/FS reclaims, however that's very unlikely because the IO-less > direct reclaims is able to progress much more faster, and they won't > deadlock each other. The threshold is raised high enough for them, so > that there can be sufficient parallel progress of !__GFP_IO/FS reclaims. > > Reported-by: NeilBrown <neilb@xxxxxxx> > Signed-off-by: Wu Fengguang <fengguang.wu@xxxxxxxxx> > --- > mm/vmscan.c | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > --- linux-next.orig/mm/vmscan.c 2010-09-15 11:58:58.000000000 +0800 > +++ linux-next/mm/vmscan.c 2010-09-15 15:36:14.000000000 +0800 > @@ -1141,36 +1141,39 @@ int isolate_lru_page(struct page *page) > return ret; > } > > /* > * Are there way too many processes in the direct reclaim path already? > */ > static int too_many_isolated(struct zone *zone, int file, > struct scan_control *sc) > { > unsigned long inactive, isolated; > + int ratio; > > if (current_is_kswapd()) > return 0; > > if (!scanning_global_lru(sc)) > return 0; > > if (file) { > inactive = zone_page_state(zone, NR_INACTIVE_FILE); > isolated = zone_page_state(zone, NR_ISOLATED_FILE); > } else { > inactive = zone_page_state(zone, NR_INACTIVE_ANON); > isolated = zone_page_state(zone, NR_ISOLATED_ANON); > } > > - return isolated > inactive; > + ratio = sc->gfp_mask & (__GFP_IO | __GFP_FS) ? 1 : 8; > + > + return isolated > inactive * ratio; > } > > /* > * TODO: Try merging with migrations version of putback_lru_pages > */ > static noinline_for_stack void > putback_lru_pages(struct zone *zone, struct scan_control *sc, > unsigned long nr_anon, unsigned long nr_file, > struct list_head *page_list) > { -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxxx For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>