Re: [PATCH 2/2] mm: thp: fix transparent_hugepage/defrag = madvise || always

Michal Hocko <mhocko@xxxxxxxx> · Wed, 22 Aug 2018 13:07:37 +0200

On Wed 22-08-18 11:02:14, Michal Hocko wrote:
> On Tue 21-08-18 17:40:49, Andrea Arcangeli wrote:
> > On Tue, Aug 21, 2018 at 01:50:57PM +0200, Michal Hocko wrote:
> [...]
> > > I really detest a new gfp flag for one time semantic that is muddy as
> > > hell.
> > 
> > Well there's no way to fix this other than to prevent reclaim to run,
> > if you still want to give a chance to page faults to obtain THP under
> > MADV_HUGEPAGE in the page fault without waiting minutes or hours for
> > khugpaged to catch up with it.
> 
> I do not get that part. Why should caller even care about reclaim vs.
> compaction. How can you even make an educated guess what makes more
> sense? This should be fully controlled by the allocator path. The caller
> should only care about how hard to try. It's been some time since I've
> looked but we used to have a gfp flags to tell that for THP allocations
> as well.

In other words, why do we even try to swap out when allocating costly
high order page for requests which do not insist to try really hard?

I mean why don't we do something like this?
---

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 03822f86f288..41005d3d4c2d 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3071,6 +3071,14 @@ unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
 	if (throttle_direct_reclaim(sc.gfp_mask, zonelist, nodemask))
 		return 1;
 
+	/*
+	 * If we are allocating a costly order and do not insist on trying really
+	 * hard then we should keep the reclaim impact at minimum. So only
+	 * focus on easily reclaimable memory.
+	 */
+	if (order > PAGE_ALLOC_COSTLY_ORDER && !(gfp_mask & __GFP_RETRY_MAYFAIL))
+		sc.may_swap = sc.may_unmap = 0;
+
 	trace_mm_vmscan_direct_reclaim_begin(order,
 				sc.may_writepage,
 				sc.gfp_mask,
-- 
Michal Hocko
SUSE Labs