On Wed, Jul 31, 2019 at 01:08:44PM +0200, Vlastimil Babka wrote: > On 7/26/19 9:40 AM, Hillf Danton wrote: > > > > On Thu, 25 Jul 2019 08:05:55 +0000 (UTC) Mel Gorman wrote: > >> > >> Agreed that the description could do with improvement. However, it > >> makes sense that if compaction reports it can make progress that it is > >> unnecessary to continue reclaiming. > > > > Thanks Mike and Mel. > > > > Hillf > > ---8<--- > > From: Hillf Danton <hdanton@xxxxxxxx> > > Subject: [RFC PATCH 1/3] mm, reclaim: make should_continue_reclaim perform dryrun detection > > > > Address the issue of should_continue_reclaim continuing true too often > > for __GFP_RETRY_MAYFAIL attempts when !nr_reclaimed and nr_scanned. > > This could happen during hugetlb page allocation causing stalls for > > minutes or hours. > > > > We can stop reclaiming pages if compaction reports it can make a progress. > > A code reshuffle is needed to do that. And it has side-effects, however, > > with allocation latencies in other cases but that would come at the cost > > of potential premature reclaim which has consequences of itself. > > I don't really understand that paragraph, did Mel meant it like this? > Fundamentally, the balancing act is between a) reclaiming more now so that compaction is more likely to succeed or b) keep pages resident to avoid refaulting. With a) high order allocations are faster, less likely to stall and more likely to succeed. However, it can also prematurely reclaim pages and free more memory than is necessary for compaction to succeed in a reasonable amount of time. We also know from testing that it can hit corner cases with hugetlbfs where stalls happen for prolonged periods of time anyway and the series overall is known to fix those stalls. > > Cc: Vlastimil Babka <vbabka@xxxxxxx> > > Cc: Johannes Weiner <hannes@xxxxxxxxxxx> > > Signed-off-by: Hillf Danton <hdanton@xxxxxxxx> > > I agree this is an improvement overall, but perhaps the patch does too > many things at once. The reshuffle is one thing and makes sense. The > change of the last return condition could perhaps be separate. Also > AFAICS the ultimate result is that when nr_reclaimed == 0, the function > will now always return false. Which makes the initial test for > __GFP_RETRY_MAYFAIL and the comments there misleading. There will no > longer be a full LRU scan guaranteed - as long as the scanned LRU chunk > yields no reclaimed page, we abort. > I've no strong feelings on whether it is worth splitting the patch. In my mind it's more or less doing one thing even though the one thing is a relatively high-level problem. -- Mel Gorman SUSE Labs