On Thu, 11 Jul 2019 02:42:56 +0800 Mike Kravetz wrote: > On 7/7/19 10:19 PM, Hillf Danton wrote: > > On Mon, 01 Jul 2019 20:15:51 -0700 Mike Kravetz wrote: > >> On 7/1/19 1:59 AM, Mel Gorman wrote: > >>> > >>> I think it would be reasonable to have should_continue_reclaim allow an > >>> exit if scanning at higher priority than DEF_PRIORITY - 2, nr_scanned is > >>> less than SWAP_CLUSTER_MAX and no pages are being reclaimed. > >> > >> Thanks Mel, > >> > >> I added such a check to should_continue_reclaim. However, it does not > >> address the issue I am seeing. In that do-while loop in shrink_node, > >> the scan priority is not raised (priority--). We can enter the loop > >> with priority == DEF_PRIORITY and continue to loop for minutes as seen > >> in my previous debug output. > >> > > Does it help raise prioity in your case? > > Thanks Hillf, sorry for delay in responding I have been AFK. > > I am not sure if you wanted to try this somehow in addition to Mel's > suggestion, or alone. > I wanted you might take a look at it if you continued to have difficulty raising scanning priority. > Unfortunately, such a change actually causes worse behavior. > > > --- a/mm/vmscan.c > > +++ b/mm/vmscan.c > > @@ -2543,11 +2543,18 @@ static inline bool should_continue_reclaim(struct pglist_data *pgdat, > > unsigned long pages_for_compaction; > > unsigned long inactive_lru_pages; > > int z; > > + bool costly_fg_reclaim = false; > > > > /* If not in reclaim/compaction mode, stop */ > > if (!in_reclaim_compaction(sc)) > > return false; > > > > + /* Let compact determine what to do for high order allocators */ > > + costly_fg_reclaim = sc->order > PAGE_ALLOC_COSTLY_ORDER && > > + !current_is_kswapd(); > > + if (costly_fg_reclaim) > > + goto check_compact; > > This goto makes us skip the 'if (!nr_reclaimed && !nr_scanned)' test. > Correct. > > + > > /* Consider stopping depending on scan and reclaim activity */ > > if (sc->gfp_mask & __GFP_RETRY_MAYFAIL) { > > /* > > @@ -2571,6 +2578,7 @@ static inline bool should_continue_reclaim(struct pglist_data *pgdat, > > return false; > > } > > > > +check_compact: > > /* > > * If we have not reclaimed enough pages for compaction and the > > * inactive lists are large enough, continue reclaiming > > It is quite easy to hit the condition where: > nr_reclaimed == 0 && nr_scanned == 0 is true, but we skip the previous test > Erm it is becoming harder to imagine what prevented you from raising priority when it was not difficult to hit the true condition above. And I see the following in the mail thread, === Date: Fri, 28 Jun 2019 11:20:42 -0700 Message-ID: <dede2f84-90bf-347a-2a17-fb6b521bf573@xxxxxxxxxx> (raw) In-Reply-To: <04329fea-cd34-4107-d1d4-b2098ebab0ec@xxxxxxx> I got back to looking into the direct reclaim/compaction stalls when trying to allocate huge pages. As previously mentioned, the code is looping for a long time in shrink_node(). The routine should_continue_reclaim() returns true perhaps more often than it should. As Vlastmil guessed, my debug code output below shows nr_scanned is remaining non-zero for quite a while. This was on v5.2-rc6. === nr_scanned != 0 and the result of should_continue_reclaim() is not false, which is unable to match the condition you easily hit. > and the compaction check: > sc->nr_reclaimed < pages_for_compaction && > inactive_lru_pages > pages_for_compaction > > is true, so we return true before the below check of costly_fg_reclaim > It is the price high order allocations pay: reclaiming enough pages for compact to do its work. With plenty of inactive pages you got no pages reclaimed and scanned. It is really hard to imagine. And costly_fg_reclaim is not good for that imho. > > @@ -2583,6 +2591,9 @@ static inline bool should_continue_reclaim(struct pglist_data *pgdat, > > inactive_lru_pages > pages_for_compaction) > > return true; > > > > + if (costly_fg_reclaim) > > + return false; > > + > > /* If compaction would go ahead or the allocation would succeed, stop */ > > for (z = 0; z <= sc->reclaim_idx; z++) { > > struct zone *zone = &pgdat->node_zones[z]; > > -- > > > > As Michal suggested, I'm going to do some testing to see what impact > dropping the __GFP_RETRY_MAYFAIL flag for these huge page allocations > will have on the number of pages allocated. > -- > Mike Kravetz >