On 16.10.19 10:51, Michal Hocko wrote:
On Wed 16-10-19 10:08:21, David Hildenbrand wrote:
On 16.10.19 09:34, Anshuman Khandual wrote:
[...]
+static bool pfn_range_valid_contig(struct zone *z, unsigned long start_pfn,
+ unsigned long nr_pages)
+{
+ unsigned long i, end_pfn = start_pfn + nr_pages;
+ struct page *page;
+
+ for (i = start_pfn; i < end_pfn; i++) {
+ page = pfn_to_online_page(i);
+ if (!page)
+ return false;
+
+ if (page_zone(page) != z)
+ return false;
+
+ if (PageReserved(page))
+ return false;
+
+ if (page_count(page) > 0)
+ return false;
+
+ if (PageHuge(page))
+ return false;
+ }
We might still try to allocate a lot of ranges that contain unmovable data
(we could avoid isolating a lot of page blocks in the first place). I'd love
to see something like pfn_range_movable() (similar, but different to
is_mem_section_removable(), which uses has_unmovable_pages()).
Just to make sure I understand. Do you want has_unmovable_pages to be
called inside pfn_range_valid_contig?
I think this requires more thought, as has_unmovable_pages() works on
pageblocks only AFAIK. If you try to allocate < MAX_ORDER - 1, you could
get a lot of false positives.
E.g., if a free "MAX_ORDER - 1" page spans two pageblocks and you only
test the second pageblock, you might detect "unmovable" if not taking
proper care of the "bigger" free page. (alloc_contig_range() properly
works around that issue)
[...]
+struct page *alloc_contig_pages(unsigned long nr_pages, gfp_t gfp_mask,
+ int nid, nodemask_t *nodemask)
+{
+ unsigned long ret, pfn, flags;
+ struct zonelist *zonelist;
+ struct zone *zone;
+ struct zoneref *z;
+
+ zonelist = node_zonelist(nid, gfp_mask);
+ for_each_zone_zonelist_nodemask(zone, z, zonelist,
+ gfp_zone(gfp_mask), nodemask) {
One important part is to never use the MOVABLE zone here (otherwise
unmovable data would end up on the movable zone). But I guess the caller is
responsible for that (not pass GFP_MOVABLE) like gigantic pages do.
Well, if the caller uses GFP_MOVABLE then the movability should be
implemented in some form. If that is not the case then it is a bug on
the caller behalf.
+ spin_lock_irqsave(&zone->lock, flags);
+
+ pfn = ALIGN(zone->zone_start_pfn, nr_pages);
This alignment does not make too much sense when allowing passing in !power
of two orders. Maybe the caller should specify the requested alignment
instead? Or should we enforce this to be aligned to make our life easier for
now?
Are there any usecases that would require than the page alignment?
Gigantic pages have to be aligned AFAIK.
--
Thanks,
David / dhildenb