On Mon 06-11-17 19:32:37, Michal Hocko wrote: > On Mon 06-11-17 20:13:36, Maxim Levitsky wrote: > > Yes, I tested git head from mainline and few kernels from ubuntu repos > > since I was lazy to compile them too. > > OK, so this hasn't worked realiably as I've suspected. > > > Do you have an idea what can I do about this issue? Do you think its > > feasable to fix this? > > Well, I think that giga pages need quite some love to be usable > reliably. The current implementation is more towards "make it work if > there is enough unused memory". > > > And if not using moveable zone, how would it even be possible to have > > guaranreed allocation of 1g pages > > Having a guaranteed giga pages is something the kernel is not yet ready > to offer. Abusing zone movable might look like the right direction > but that is not really the case until we make sure those pages are > migratable. > > There has been a simple patch which makes PUD (1GB) pages migrateable > http://lkml.kernel.org/r/20170913101047.GA13026@xxxxxxxxx but I've had > concerns that it really didn't consider the migration path much > http://lkml.kernel.org/r/20171003073301.hydw7jf2wztsx2om%40dhcp22.suse.cz > I still believe the patch is not complete but maybe it is not that far > away from being so. E.g. the said pfn_range_valid_gigantic can be > enhanced to make the migration much more reliable or get rid of it > altogether because the pfn based allocator already knows how to do > migration and other stuff. Here is the first shot on the weird pfn_range_valid_gigantic. This is completely (even compile) untested. It should just give an idea. I will think about this some more later. If you have a scratch system that you are not afraid to play with I would appreciate if you could give it a try. --- diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 5ab12fda8ed5..17ca753560b7 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1071,34 +1071,6 @@ static int __alloc_gigantic_page(unsigned long start_pfn, gfp_mask); } -static bool pfn_range_valid_gigantic(struct zone *z, - unsigned long start_pfn, unsigned long nr_pages) -{ - unsigned long i, end_pfn = start_pfn + nr_pages; - struct page *page; - - for (i = start_pfn; i < end_pfn; i++) { - if (!pfn_valid(i)) - return false; - - page = pfn_to_page(i); - - if (page_zone(page) != z) - return false; - - if (PageReserved(page)) - return false; - - if (page_count(page) > 0) - return false; - - if (PageHuge(page)) - return false; - } - - return true; -} - static bool zone_spans_last_pfn(const struct zone *zone, unsigned long start_pfn, unsigned long nr_pages) { @@ -1110,7 +1082,7 @@ static struct page *alloc_gigantic_page(int nid, struct hstate *h) { unsigned int order = huge_page_order(h); unsigned long nr_pages = 1 << order; - unsigned long ret, pfn, flags; + unsigned long ret, pfn; struct zonelist *zonelist; struct zone *zone; struct zoneref *z; @@ -1119,28 +1091,29 @@ static struct page *alloc_gigantic_page(int nid, struct hstate *h) gfp_mask = htlb_alloc_mask(h) | __GFP_THISNODE; zonelist = node_zonelist(nid, gfp_mask); for_each_zone_zonelist_nodemask(zone, z, zonelist, gfp_zone(gfp_mask), NULL) { - spin_lock_irqsave(&zone->lock, flags); pfn = ALIGN(zone->zone_start_pfn, nr_pages); while (zone_spans_last_pfn(zone, pfn, nr_pages)) { - if (pfn_range_valid_gigantic(zone, pfn, nr_pages)) { - /* - * We release the zone lock here because - * alloc_contig_range() will also lock the zone - * at some point. If there's an allocation - * spinning on this lock, it may win the race - * and cause alloc_contig_range() to fail... - */ - spin_unlock_irqrestore(&zone->lock, flags); - ret = __alloc_gigantic_page(pfn, nr_pages, gfp_mask); - if (!ret) - return pfn_to_page(pfn); - spin_lock_irqsave(&zone->lock, flags); + struct page *page = pfn_to_online_page(pfn); + + /* + * be careful about offline pageblocks and interleaving + * zones + */ + if (!page || page_zone(page) != zone) { + pfn += pageblock_nr_pages; + continue; } + if (PageReserved(page)) { + pfn++; + continue; + } + + ret = __alloc_gigantic_page(pfn, nr_pages, gfp_mask); + if (!ret) + return pfn_to_page(pfn); pfn += nr_pages; } - - spin_unlock_irqrestore(&zone->lock, flags); } return NULL; -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>