I can do it after I fix this. That change might or might not help only if we make
some redesign on how migratetype is managed. If MIGRATE_ISOLATE does not
overwrite existing migratetype, the code might not need to split a page and move
it to MIGRATE_ISOLATE freelist?
Did someone test how memory offlining plays along with that? (I can try myself
within the next 1-2 weeks)
There [mm/memory_hotplug.c:offline_pages] we always cover full MAX_ORDER ranges,
though.
ret = start_isolate_page_range(start_pfn, end_pfn,
MIGRATE_MOVABLE,
MEMORY_OFFLINE | REPORT_FAILURE,
GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL);
Since a full MAX_ORDER range is passed, no free page split will happen.
Okay, thanks for verifying that it should not be affected!
The fundamental issue in alloc_contig_range() is that to work at
pageblock level, a page (>pageblock_order) can have one part is isolated and
the rest is a different migratetype. {add_to,move_to,del_page_from}_free_list()
now checks first pageblock migratetype, so such a page needs to be removed
from its free_list, set MIGRATE_ISOLATE on one of the pageblock, split, and
finally put back to multiple free lists. This needs to be done at isolation stage
before free pages are removed from their free lists (the stage after isolation).
One idea was to always isolate larger chunks, and handle movability checks/split/etc
at a later stage. Once isolation would be decoupled from the actual/original migratetype,
the could have been easier to handle (especially some corner cases I had in mind back then).
I think it is a good idea. When I coded alloc_contig_range() up, I tried to
accommodate existing set_migratetype_isolate(), which calls has_unmovable_pages().
If these two are decoupled, set_migrateype_isolate() can work on MAX_ORDER-aligned
ranges and has_unmovable_pages() can still work on pageblock-aligned ranges.
Let me give this a try.
But again, just some thought I had back then, maybe it doesn't help for
anything; I found more time to look into the whole thing in more detail.
If MIGRATE_ISOLATE is a separate flag and we are OK with leaving isolated pages
in their original migratetype and check migratetype before allocating a page,
that might help. But that might add extra work (e.g., splitting a partially
isolated free page before allocation) in the really hot code path, which is not
desirable.
With MIGRATE_ISOLATE being a separate flag, one idea was to have not a single
separate isolate list, but one per "proper migratetype". But again, just some random
thoughts I had back then, I never had sufficient time to think it all through.
Got it. I will think about it.
One question on separate MIGRATE_ISOLATE:
the implementation I have in mind is that MIGRATE_ISOLATE will need a dedicated flag
bit instead of being one of migratetype. But now there are 5 migratetypes +
Exactly what I was concerned about back then ...
MIGRATE_ISOLATE and PB_migratetype_bits is 3, so an extra migratetype_bit is needed.
But current migratetype implementation is a word-based operation, requiring
NR_PAGEBLOCK_BITS to be divisor of BITS_PER_LONG. This means NR_PAGEBLOCK_BITS
needs to be increased from 4 to 8 to meet the requirement, wasting a lot of space.
... until I did the math. Let's assume a pageblock is 2 MiB.
4/(2* 1024 * 1024 * 8) = 0,00002384185791016 %
8/(2* 1024 * 1024 * 8) -> 1 / (2* 1024 * 1024) = 0,00004768371582031 %
For a 1 TiB machine that means 256 KiB vs. 512 KiB
I concluded that "wasting a lot of space" is not really the right word
to describe that :)
Just to put it into perspective, the memmap (64/4096) for a 1 TiB
machine is ... 16 GiB.
An alternative is to have a separate array for MIGRATE_ISOLATE, which requires
additional changes. Let me know if you have a better idea. Thanks.
It would probably be cleanest to just use one byte per pageblock. That
would cleanup the whole machinery eventually as well.
--
Cheers,
David / dhildenb