Re: [PATCH 1/4] mm/hugetlb: Enable PUD level huge page migration

Anshuman Khandual <anshuman.khandual@xxxxxxx> · Wed, 3 Oct 2018 15:28:23 +0530

On 10/03/2018 12:28 PM, Michal Hocko wrote:
> On Wed 03-10-18 07:46:27, Anshuman Khandual wrote:
>>
>>
>> On 10/02/2018 06:09 PM, Michal Hocko wrote:
>>> On Tue 02-10-18 17:45:28, Anshuman Khandual wrote:
>>>> Architectures like arm64 have PUD level HugeTLB pages for certain configs
>>>> (1GB huge page is PUD based on ARM64_4K_PAGES base page size) that can be
>>>> enabled for migration. It can be achieved through checking for PUD_SHIFT
>>>> order based HugeTLB pages during migration.
>>>
>>> Well a long term problem with hugepage_migration_supported is that it is
>>> used in two different context 1) to bail out from the migration early
>>> because the arch doesn't support migration at all and 2) to use movable
>>> zone for hugetlb pages allocation. I am especially concerned about the
>>> later because the mere support for migration is not really good enough.
>>> Are you really able to find a different giga page during the runtime to
>>> move an existing giga page out of the movable zone?
>>
>> I pre-allocate them before trying to initiate the migration (soft offline
>> in my experiments). Hence it should come from the pre-allocated HugeTLB
>> pool instead from the buddy. I might be missing something here but do we
>> ever allocate HugeTLB on the go when trying to migrate ? IIUC it always
>> came from the pool (unless its something related to ovecommit/surplus).
>> Could you please kindly explain regarding how migration target HugeTLB
>> pages are allocated on the fly from movable zone.
> 
> Hotplug comes to mind. You usually do not pre-allocate to cover full
> node going offline. And people would like to do that. Another example is
> CMA. You would really like to move pages out of the way.

You are right.

Hotplug migration:

__offline_pages
   do_migrate_range
	migrate_pages(...new_node_page...)

new_node_page
   new_page_nodemask
	alloc_huge_page_nodemask
	   dequeue_huge_page_nodemask (Getting from pool)
	or
	   alloc_migrate_huge_page    (Getting from buddy - non-gigantic)
		alloc_fresh_huge_page
		    alloc_buddy_huge_page
			__alloc_pages_nodemask ----> goes into buddy

CMA allocation:

cma_alloc
   alloc_contig_range
	__alloc_contig_migrate_range
		migrate_pages(...alloc_migrate_target...)

alloc_migrate_target
   new_page_nodemask -> __alloc_pages_nodemask ---> goes into buddy

But this is not applicable for gigantic pages for which it backs off way
before going into buddy. With MAX_ORDER as 11 its anything beyond 64MB
for 64K pages, 16MB for 16K pages, 4MB for 4K pages etc. So all those
bigger huge pages like 512MB/1GB/16GB will not be part of the HugeTLB/CMA
initiated migrations. I will look into migration details during auto NUMA,
compaction, memory-failure etc to see if gigantic huge page is allocated
from the buddy with ___alloc_pages_nodemask or with alloc_contig_range().

> 
>> But even if there are some chances of run time allocation failure from
>> movable zone (as in point 2) that should not block the very initiation
>> of migration itself. IIUC thats not the semantics for either THP or
>> normal pages. Why should it be different here. If the allocation fails
>> we should report and abort as always. Its the caller of migration taking
>> the chances. why should we prevent that.
> 
> Yes I agree, hence the distinction between the arch support for
> migrateability and the criterion on the movable zone placement.
movable zone placement sounds very tricky here. How can the platform
(through the hook huge_movable) before hand say whether destination
page could be allocated from the ZONE_MOVABLE without looking into the
state of buddy at migration (any sort attempt to do this is going to
be expensive) or it merely indicates the desire to live with possible
consequence (unable to hot unplug/CMA going forward) for a migration
which might end up in an unmovable area.

>  
>>>
>>> So I guess we want to split this into two functions
>>> arch_hugepage_migration_supported and hugepage_movable. The later would
>>
>> So the set difference between arch_hugepage_migration_supported and 
>> hugepage_movable still remains un-migratable ? Then what is the purpose
>> for arch_hugepage_migration_supported page size set in the first place.
>> Does it mean we allow the migration at the beginning and the abort later
>> when the page size does not fall within the subset for hugepage_movable.
>> Could you please kindly explain this in more detail.
> 
> The purpose of arch_hugepage_migration_supported is to tell whether it
> makes any sense to even try to migration. The allocation placement is

Which kind of matches what we have right now and being continued with this
proposal in the series.

> completely independent on this choice. The later just says whether it is
> feasible to place a hugepage to the zone movable. Sure regular 2MB pages

What do you exactly mean by feasible ? Wont it depend on the state of the
buddy allocator (ZONE_MOVABLE in particular) and it's ability to accommodate
a given huge page. How can the platform decide on it ? Or as I mentioned
before it's platform's willingness to live with unmovable huge pages (of
certain sizes) as a consequence of migration.

> do not guarantee movability as well because of the memory fragmentation.
> But allocating a 2MB is a completely different storage from 1G or even
> larger huge pages, isn't it?

Right I understand that. Hotplug/CMA capability goes down more with bigger
huge pages being unmovable on the system.

> 
>>> be a reasonably migrateable subset of the former. Without that this
>>> patch migth introduce subtle regressions when somebody relies on movable
>>> zone to be really movable.
>> PUD based HugeTLB pages were never migratable, then how can there be any
>> regression here ?
> 
> That means that they haven't been allocated from the movable zone
> before. Which is going to change by this patch.

The source PUD huge page might have been allocated from movable zone.
The denial for migration is explicit and because we dont check for
PUD_SHIFT in there and nothing to do with the zone type where the
source page belongs. But are you referring to regression caused by
something like this.

Before the patch:

- PUD huge page allocated on ZONE_MOVABLE
- Huge page is movable (Hotplug/CMA works)

After the patch:

- PUD huge page allocated on ZONE_MOVABLE
- Migration is successful without checking for destination page's zone
- PUD huge page (new) is not on ZONE_MOVABLE
- Huge page is unmovable (Hotplug/CMA does not work anymore) -> Regression!

> 
>> At present we even support PGD based HugeTLB pages for
>> migration.
> 
> And that is already wrong but nobody probably cares because those are
> rarely used.
> 
>> Wondering how PUD based ones are going to be any different.
> 
> It is not different, PGD is dubious already.
>
Got it.