On 10/03/2018 12:28 PM, Michal Hocko wrote: > On Wed 03-10-18 07:46:27, Anshuman Khandual wrote: >> >> >> On 10/02/2018 06:09 PM, Michal Hocko wrote: >>> On Tue 02-10-18 17:45:28, Anshuman Khandual wrote: >>>> Architectures like arm64 have PUD level HugeTLB pages for certain configs >>>> (1GB huge page is PUD based on ARM64_4K_PAGES base page size) that can be >>>> enabled for migration. It can be achieved through checking for PUD_SHIFT >>>> order based HugeTLB pages during migration. >>> >>> Well a long term problem with hugepage_migration_supported is that it is >>> used in two different context 1) to bail out from the migration early >>> because the arch doesn't support migration at all and 2) to use movable >>> zone for hugetlb pages allocation. I am especially concerned about the >>> later because the mere support for migration is not really good enough. >>> Are you really able to find a different giga page during the runtime to >>> move an existing giga page out of the movable zone? >> >> I pre-allocate them before trying to initiate the migration (soft offline >> in my experiments). Hence it should come from the pre-allocated HugeTLB >> pool instead from the buddy. I might be missing something here but do we >> ever allocate HugeTLB on the go when trying to migrate ? IIUC it always >> came from the pool (unless its something related to ovecommit/surplus). >> Could you please kindly explain regarding how migration target HugeTLB >> pages are allocated on the fly from movable zone. > > Hotplug comes to mind. You usually do not pre-allocate to cover full > node going offline. And people would like to do that. Another example is > CMA. You would really like to move pages out of the way. You are right. Hotplug migration: __offline_pages do_migrate_range migrate_pages(...new_node_page...) new_node_page new_page_nodemask alloc_huge_page_nodemask dequeue_huge_page_nodemask (Getting from pool) or alloc_migrate_huge_page (Getting from buddy - non-gigantic) alloc_fresh_huge_page alloc_buddy_huge_page __alloc_pages_nodemask ----> goes into buddy CMA allocation: cma_alloc alloc_contig_range __alloc_contig_migrate_range migrate_pages(...alloc_migrate_target...) alloc_migrate_target new_page_nodemask -> __alloc_pages_nodemask ---> goes into buddy But this is not applicable for gigantic pages for which it backs off way before going into buddy. With MAX_ORDER as 11 its anything beyond 64MB for 64K pages, 16MB for 16K pages, 4MB for 4K pages etc. So all those bigger huge pages like 512MB/1GB/16GB will not be part of the HugeTLB/CMA initiated migrations. I will look into migration details during auto NUMA, compaction, memory-failure etc to see if gigantic huge page is allocated from the buddy with ___alloc_pages_nodemask or with alloc_contig_range(). > >> But even if there are some chances of run time allocation failure from >> movable zone (as in point 2) that should not block the very initiation >> of migration itself. IIUC thats not the semantics for either THP or >> normal pages. Why should it be different here. If the allocation fails >> we should report and abort as always. Its the caller of migration taking >> the chances. why should we prevent that. > > Yes I agree, hence the distinction between the arch support for > migrateability and the criterion on the movable zone placement. movable zone placement sounds very tricky here. How can the platform (through the hook huge_movable) before hand say whether destination page could be allocated from the ZONE_MOVABLE without looking into the state of buddy at migration (any sort attempt to do this is going to be expensive) or it merely indicates the desire to live with possible consequence (unable to hot unplug/CMA going forward) for a migration which might end up in an unmovable area. > >>> >>> So I guess we want to split this into two functions >>> arch_hugepage_migration_supported and hugepage_movable. The later would >> >> So the set difference between arch_hugepage_migration_supported and >> hugepage_movable still remains un-migratable ? Then what is the purpose >> for arch_hugepage_migration_supported page size set in the first place. >> Does it mean we allow the migration at the beginning and the abort later >> when the page size does not fall within the subset for hugepage_movable. >> Could you please kindly explain this in more detail. > > The purpose of arch_hugepage_migration_supported is to tell whether it > makes any sense to even try to migration. The allocation placement is Which kind of matches what we have right now and being continued with this proposal in the series. > completely independent on this choice. The later just says whether it is > feasible to place a hugepage to the zone movable. Sure regular 2MB pages What do you exactly mean by feasible ? Wont it depend on the state of the buddy allocator (ZONE_MOVABLE in particular) and it's ability to accommodate a given huge page. How can the platform decide on it ? Or as I mentioned before it's platform's willingness to live with unmovable huge pages (of certain sizes) as a consequence of migration. > do not guarantee movability as well because of the memory fragmentation. > But allocating a 2MB is a completely different storage from 1G or even > larger huge pages, isn't it? Right I understand that. Hotplug/CMA capability goes down more with bigger huge pages being unmovable on the system. > >>> be a reasonably migrateable subset of the former. Without that this >>> patch migth introduce subtle regressions when somebody relies on movable >>> zone to be really movable. >> PUD based HugeTLB pages were never migratable, then how can there be any >> regression here ? > > That means that they haven't been allocated from the movable zone > before. Which is going to change by this patch. The source PUD huge page might have been allocated from movable zone. The denial for migration is explicit and because we dont check for PUD_SHIFT in there and nothing to do with the zone type where the source page belongs. But are you referring to regression caused by something like this. Before the patch: - PUD huge page allocated on ZONE_MOVABLE - Huge page is movable (Hotplug/CMA works) After the patch: - PUD huge page allocated on ZONE_MOVABLE - Migration is successful without checking for destination page's zone - PUD huge page (new) is not on ZONE_MOVABLE - Huge page is unmovable (Hotplug/CMA does not work anymore) -> Regression! > >> At present we even support PGD based HugeTLB pages for >> migration. > > And that is already wrong but nobody probably cares because those are > rarely used. > >> Wondering how PUD based ones are going to be any different. > > It is not different, PGD is dubious already. > Got it.