Re: [PATCH RESEND v2 4/6] mm/page_alloc: sort out the alloc_contig_range() gfp flags mess

Zi Yan <ziy@xxxxxxxxxx> · Tue, 03 Dec 2024 10:49:20 -0500

On 3 Dec 2024, at 9:24, Vlastimil Babka wrote:

> On 12/3/24 15:12, David Hildenbrand wrote:
>> On 03.12.24 14:55, Vlastimil Babka wrote:
>>> On 12/3/24 10:47, David Hildenbrand wrote:
>>>> It's all a bit complicated for alloc_contig_range(). For example, we don't
>>>> support many flags, so let's start bailing out on unsupported
>>>> ones -- ignoring the placement hints, as we are already given the range
>>>> to allocate.
>>>>
>>>> While we currently set cc.gfp_mask, in __alloc_contig_migrate_range() we
>>>> simply create yet another GFP mask whereby we ignore the reclaim flags
>>>> specify by the caller. That looks very inconsistent.
>>>>
>>>> Let's clean it up, constructing the gfp flags used for
>>>> compaction/migration exactly once. Update the documentation of the
>>>> gfp_mask parameter for alloc_contig_range() and alloc_contig_pages().
>>>>
>>>> Acked-by: Zi Yan <ziy@xxxxxxxxxx>
>>>> Signed-off-by: David Hildenbrand <david@xxxxxxxxxx>
>>>
>>> Reviewed-by: Vlastimil Babka <vbabka@xxxxxxx>
>>>
>>>> +	/*
>>>> +	 * Flags to control page compaction/migration/reclaim, to free up our
>>>> +	 * page range. Migratable pages are movable, __GFP_MOVABLE is implied
>>>> +	 * for them.
>>>> +	 *
>>>> +	 * Traditionally we always had __GFP_HARDWALL|__GFP_RETRY_MAYFAIL set,
>>>> +	 * keep doing that to not degrade callers.
>>>> +	 */
>>>
>>> Wonder if we could revisit that eventually. Why limit migration targets by
>>> cpuset via __GFP_HARDWALL if we were not called with __GFP_HARDWALL? And why
>>> weaken the attempts with __GFP_RETRY_MAYFAIL if we didn't specify it?
>>
>> See below.
>>
>>>
>>> Unless I'm missing something, cc->gfp is only checked for __GFP_FS and
>>> __GFP_NOWARN in few places, so it's mostly migration_target_control the
>>> callers could meaningfully influence.
>>
>> Note the fist change in the file, where we now use the mask instead of coming up
>> with another one out of the blue. :)
>
> I know. What I wanted to say - cc->gfp is on its own only checked in few
> places, but now since we also translate it to migration_target_control's
> gfp_mask, it's mostly that part the caller might influence with the passed
> flags. But we still impose own additions to it, limiting that influence.
>
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index ce7589a4ec01..54594cc4f650 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -6294,7 +6294,7 @@ static int __alloc_contig_migrate_range(struct compact_control *cc,
>>   	int ret = 0;
>>   	struct migration_target_control mtc = {
>>   		.nid = zone_to_nid(cc->zone),
>> -		.gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL,
>> +		.gfp_mask = cc->gfp_mask,
>>   		.reason = MR_CONTIG_RANGE,
>>   	};
>>
>> GFP_USER contains __GFP_HARDWALL. I am not sure if that matters here, but
>
> Yeah wonder if GFP_USER was used specifically for that part, or just randomly :)
>
>> likely the thing we are assuming here is that we are migrating a page, and
>> usually, these are user allocation (except maybe balloon and some other non-lru
>> movable things).
>
> Yeah and user allocations obey cpuset and mempolicies etc. But these are
> likely somebody elses allocations that were done according to their
> policies. With our migration we might be actually violating those, which
> probably can't be helped (is at least migration within the same node
> preferred? hmm). But it doesn't seem to me that our caller's restrictions
> (if those exist, would be enforced by __GFP_HARDWALL) are that relevant for
> somebody else's pages?

Yeah, I was wondering why current_gfp_context() is used to adjust gfp_mask,
since current context might not be relevant. But I see it is used in
the original code, so I did not ask. If current context is irrelevant w.r.t
the to-be-migrated pages, should current_gfp_context() part be removed?

Ideally, to respect the to-be-migrated page’s gfp, we might need to go through
rmap to find its corresponding vma and possible task struct, but that seems
overly complicated.

Best Regards,
Yan, Zi