David Hildenbrand <david@xxxxxxxxxx> writes: > On 15.06.23 09:22, Huang, Ying wrote: >> Baolin Wang <baolin.wang@xxxxxxxxxxxxxxxxx> writes: >> >>> On 6/15/2023 11:22 AM, Huang, Ying wrote: >>>> Hi, Mel, >>>> Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> writes: >>>> >>>>> On Tue, Jun 13, 2023 at 04:55:04PM +0800, Baolin Wang wrote: >>>>>> On some machines, the normal zone can have a large memory hole like >>>>>> below memory layout, and we can see the range from 0x100000000 to >>>>>> 0x1800000000 is a hole. So when isolating some migratable pages, the >>>>>> scanner can meet the hole and it will take more time to skip the large >>>>>> hole. From my measurement, I can see the isolation scanner will take >>>>>> 80us ~ 100us to skip the large hole [0x100000000 - 0x1800000000]. >>>>>> >>>>>> So adding a new helper to fast search next online memory section >>>>>> to skip the large hole can help to find next suitable pageblock >>>>>> efficiently. With this patch, I can see the large hole scanning only >>>>>> takes < 1us. >>>>>> >>>>>> [ 0.000000] Zone ranges: >>>>>> [ 0.000000] DMA [mem 0x0000000040000000-0x00000000ffffffff] >>>>>> [ 0.000000] DMA32 empty >>>>>> [ 0.000000] Normal [mem 0x0000000100000000-0x0000001fa7ffffff] >>>>>> [ 0.000000] Movable zone start for each node >>>>>> [ 0.000000] Early memory node ranges >>>>>> [ 0.000000] node 0: [mem 0x0000000040000000-0x0000000fffffffff] >>>>>> [ 0.000000] node 0: [mem 0x0000001800000000-0x0000001fa3c7ffff] >>>>>> [ 0.000000] node 0: [mem 0x0000001fa3c80000-0x0000001fa3ffffff] >>>>>> [ 0.000000] node 0: [mem 0x0000001fa4000000-0x0000001fa402ffff] >>>>>> [ 0.000000] node 0: [mem 0x0000001fa4030000-0x0000001fa40effff] >>>>>> [ 0.000000] node 0: [mem 0x0000001fa40f0000-0x0000001fa73cffff] >>>>>> [ 0.000000] node 0: [mem 0x0000001fa73d0000-0x0000001fa745ffff] >>>>>> [ 0.000000] node 0: [mem 0x0000001fa7460000-0x0000001fa746ffff] >>>>>> [ 0.000000] node 0: [mem 0x0000001fa7470000-0x0000001fa758ffff] >>>>>> [ 0.000000] node 0: [mem 0x0000001fa7590000-0x0000001fa7ffffff] >>>>>> >>>>>> Signed-off-by: Baolin Wang <baolin.wang@xxxxxxxxxxxxxxxxx> >>>>> >>>>> This may only be necessary for non-contiguous zones so a check for >>>>> zone_contiguous could be made but I suspect the saving, if any, would be >>>>> marginal. >>>>> >>>>> However, it's subtle that block_end_pfn can end up in an arbirary location >>>>> past the end of the zone or past cc->free_pfn. As the "continue" will update >>>>> cc->migrate_pfn, that might lead to errors in the future. It would be a >>>>> lot safer to pass in cc->free_pfn and do two things with the value. First, >>>>> there is no point scanning for a valid online section past cc->free_pfn so >>>>> terminating after cc->free_pfn may save some cycles. Second, cc->migrate_pfn >>>>> does not end up with an arbitrary value which is a more defensive approach >>>>> to any future programming errors. >>>> I have thought about this before. Originally, I had thought that we >>>> were safe because cc->free_pfn should be in a online section and >>>> block_end_pfn should reach cc->free_pfn before the end of zone. But >>>> after checking more code and thinking about it again, I found that the >>>> underlying sections may go offline under us during compaction. So that, >>>> cc->free_pfn may be in a offline section or after the end of zone. So, >>>> you are right, we need to consider the range of block_end_pfn. >>>> But, if we thought in this way (memory online/offline at any time), >>>> it >>>> appears that we need to check whether the underlying section was >>>> offlined. For example, is it safe to use "pfn_to_page()" in >>>> "isolate_migratepages_block()"? Is it possible for the underlying >>>> section to be offlined under us? >>> >>> It is possible. There is a previous discussion[1] about the race >>> between pfn_to_online_page() and memory offline. >>> >>> [1] >>> https://lore.kernel.org/lkml/87zgc6buoq.fsf@xxxxxxxxxx/T/#m642d91bcc726437e1848b295bc57ce249c7ca399 >> Thank you very much for sharing! That answers my questions >> directly! > > I remember another discussion (but can't find it) regarding why memory > compaction can get away without pfn_to_online_page() all over the > place. The use is limited to __reset_isolation_pfn(). Per my understanding, isolate_migratepages() -> pageblock_pfn_to_page() will check whether the pageblock is online. So if the pageblock isn't offlined afterwards, we can use pfn_to_page(). > But yes, in theory pfn_to_online_page() can race with memory offlining. Thanks for confirmation. Best Regards, Huang, Ying