Re: [PATCH] vmscan: scan pages until it founds eligible pages

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, May 03, 2017 at 08:00:44AM +0200, Michal Hocko wrote:
> On Wed 03-05-17 13:48:09, Minchan Kim wrote:
> > On Tue, May 02, 2017 at 05:14:36PM +0200, Michal Hocko wrote:
> > > On Tue 02-05-17 23:51:50, Minchan Kim wrote:
> > > > Hi Michal,
> > > > 
> > > > On Tue, May 02, 2017 at 09:54:32AM +0200, Michal Hocko wrote:
> > > > > On Tue 02-05-17 14:14:52, Minchan Kim wrote:
> > > > > > Oops, forgot to add lkml and linux-mm.
> > > > > > Sorry for that.
> > > > > > Send it again.
> > > > > > 
> > > > > > >From 8ddf1c8aa15baf085bc6e8c62ce705459d57ea4c Mon Sep 17 00:00:00 2001
> > > > > > From: Minchan Kim <minchan@xxxxxxxxxx>
> > > > > > Date: Tue, 2 May 2017 12:34:05 +0900
> > > > > > Subject: [PATCH] vmscan: scan pages until it founds eligible pages
> > > > > > 
> > > > > > On Tue, May 02, 2017 at 01:40:38PM +0900, Minchan Kim wrote:
> > > > > > There are premature OOM happening. Although there are a ton of free
> > > > > > swap and anonymous LRU list of elgible zones, OOM happened.
> > > > > > 
> > > > > > With investigation, skipping page of isolate_lru_pages makes reclaim
> > > > > > void because it returns zero nr_taken easily so LRU shrinking is
> > > > > > effectively nothing and just increases priority aggressively.
> > > > > > Finally, OOM happens.
> > > > > 
> > > > > I am not really sure I understand the problem you are facing. Could you
> > > > > be more specific please? What is your configuration etc...
> > > > 
> > > > Sure, KVM guest on x86_64, It has 2G memory and 1G swap and configured
> > > > movablecore=1G to simulate highmem zone.
> > > > Workload is a process consumes 2.2G memory and then random touch the
> > > > address space so it makes lots of swap in/out.
> > > > 
> > > > > 
> > > > > > balloon invoked oom-killer: gfp_mask=0x17080c0(GFP_KERNEL_ACCOUNT|__GFP_ZERO|__GFP_NOTRACK), nodemask=(null),  order=0, oom_score_adj=0
> > > > > [...]
> > > > > > Node 0 active_anon:1698864kB inactive_anon:261256kB active_file:208kB inactive_file:184kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:532kB dirty:108kB writeback:0kB shmem:172kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
> > > > > > DMA free:7316kB min:32kB low:44kB high:56kB active_anon:8064kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15908kB mlocked:0kB slab_reclaimable:464kB slab_unreclaimable:40kB kernel_stack:0kB pagetables:24kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
> > > > > > lowmem_reserve[]: 0 992 992 1952
> > > > > > DMA32 free:9088kB min:2048kB low:3064kB high:4080kB active_anon:952176kB inactive_anon:0kB active_file:36kB inactive_file:0kB unevictable:0kB writepending:88kB present:1032192kB managed:1019388kB mlocked:0kB slab_reclaimable:13532kB slab_unreclaimable:16460kB kernel_stack:3552kB pagetables:6672kB bounce:0kB free_pcp:56kB local_pcp:24kB free_cma:0kB
> > > > > > lowmem_reserve[]: 0 0 0 959
> > > > > 
> > > > > Hmm DMA32 has sufficient free memory to allow this order-0 request.
> > > > > Inactive anon lru is basically empty. Why do not we rotate a really
> > > > > large active anon list? Isn't this the primary problem?
> > > > 
> > > > It's a side effect by skipping page logic in isolate_lru_pages
> > > > I mentioned above in changelog.
> > > > 
> > > > The problem is a lot of anonymous memory in movable zone(ie, highmem)
> > > > and non-small memory in DMA32 zone.
> > > 
> > > Such a configuration is questionable on its own. But let't keep this
> > > part alone.
> > 
> > It seems you are misunderstood. It's really common on 32bit.
> 
> Yes, I am not arguing about 32b systems. It is quite common to see
> issues which are inherent to the highmem zone.
> 
> > Think of 2G DRAM system on 32bit. Normally, it's 1G normal:1G highmem.
> > It's almost same with one I configured.
> > 
> > > 
> > > > In heavy memory pressure,
> > > > requesting a page in GFP_KERNEL triggers reclaim. VM knows inactive list
> > > > is low so it tries to deactivate pages. For it, first of all, it tries
> > > > to isolate pages from active list but there are lots of anonymous pages
> > > > from movable zone so skipping logic in isolate_lru_pages works. With
> > > > the result, isolate_lru_pages cannot isolate any eligible pages so
> > > > reclaim trial is effectively void. It continues to meet OOM.
> > > 
> > > But skipped pages should be rotated and we should eventually hit pages
> > > from the right zone(s). Moreover we should scan the full LRU at priority
> > > 0 so why exactly we hit the OOM killer?
> > 
> > Yes, full scan in priority 0 but keep it in mind that the number of full
> > LRU pages to scan is one of eligible pages, not all pages of the node.
> 
> I have hard time understanding what you are trying to say here.
> 
> > And isolate_lru_pages have accounted skipped pages as scan count so that
> > VM cannot isolate any pages of eligible pages in LRU if non-eligible pages
> > are a lot in the LRU.
> > 
> > > 
> > > Anyway [1] has changed this behavior. Are you seeing the issue with this
> > > patch dropped?
> > 
> > Good point. Before the patch, it didn't increase scan count with skipped
> > pages so with reverting [1], I guess it might work but worry about
> > isolating lots of skipped pages into temporal pages_skipped list which
> > might causes premate OOM. Anyway, I will test it when I returns at
> > office after vacation.
> 
> I do not think we want to drop this patch. I think we might be good
> enough to simply fold this into the patch
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 24efcc20af91..ac146f10f222 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1472,7 +1472,7 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
>  	LIST_HEAD(pages_skipped);
>  
>  	for (scan = 0; scan < nr_to_scan && nr_taken < nr_to_scan &&
> -					!list_empty(src); scan++) {
> +					!list_empty(src);) {
>  		struct page *page;
>  
>  		page = lru_to_page(src);
> @@ -1486,6 +1486,12 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
>  			continue;
>  		}
>  
> +		/*
> +		 * Do not count skipped pages because we do want to isolate
> +		 * some pages even when the LRU mostly contains ineligible
> +		 * pages
> +		 */

How about adding comment about "why"?

/*
 * Do not count skipped pages because it makes the function to return with
 * none isolated pages if the LRU mostly contains inelgible pages so that
 * VM cannot reclaim any pages and trigger premature OOM.
 */


> +		scan++;
>  		switch (__isolate_lru_page(page, mode)) {
>  		case 0:
>  			nr_pages = hpage_nr_pages(page);

Confirmed. It works as expected but it changed scan counter's behavior.
How about this?


diff --git a/mm/vmscan.c b/mm/vmscan.c
index 2314aca47d12..846922d7942e 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1469,7 +1469,7 @@ static __always_inline void update_lru_sizes(struct lruvec *lruvec,
  *
  * Appropriate locks must be held before calling this function.
  *
- * @nr_to_scan:	The number of pages to look through on the list.
+ * @nr_to_scan:	The number of eligible pages to look through on the list.
  * @lruvec:	The LRU vector to pull pages from.
  * @dst:	The temp list to put pages on to.
  * @nr_scanned:	The number of pages that were scanned.
@@ -1489,11 +1489,13 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
 	unsigned long nr_zone_taken[MAX_NR_ZONES] = { 0 };
 	unsigned long nr_skipped[MAX_NR_ZONES] = { 0, };
 	unsigned long skipped = 0;
-	unsigned long scan, nr_pages;
+	unsigned long scan, total_scan, nr_pages;
 	LIST_HEAD(pages_skipped);
 
-	for (scan = 0; scan < nr_to_scan && nr_taken < nr_to_scan &&
-					!list_empty(src); scan++) {
+	for (total_scan = scan = 0; scan < nr_to_scan &&
+					nr_taken < nr_to_scan &&
+					!list_empty(src);
+					total_scan++) {
 		struct page *page;
 
 		page = lru_to_page(src);
@@ -1507,6 +1509,13 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
 			continue;
 		}
 
+		/*
+		 * Do not count skipped pages because it makes the function to
+		 * return with none isolated pages if the LRU mostly contains
+		 * inelgible pages so that VM cannot reclaim any pages and
+		 * trigger premature OOM.
+		 */
+		scan++;
 		switch (__isolate_lru_page(page, mode)) {
 		case 0:
 			nr_pages = hpage_nr_pages(page);
@@ -1544,9 +1553,9 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
 			skipped += nr_skipped[zid];
 		}
 	}
-	*nr_scanned = scan;
+	*nr_scanned = total_scan;
 	trace_mm_vmscan_lru_isolate(sc->reclaim_idx, sc->order, nr_to_scan,
-				    scan, skipped, nr_taken, mode, lru);
+				    total_scan, skipped, nr_taken, mode, lru);
 	update_lru_sizes(lruvec, lru, nr_zone_taken);
 	return nr_taken;
 }

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>



[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]
  Powered by Linux