Hi Michal, On 11/05/18 at 10:28am, Michal Hocko wrote: > > Or something like this. Ugly as hell, no question about that. I also > have to think about this some more to convince myself this will not > result in an endless loop under some situations. It failed. Paste the log and patch diff here, please help check if I made any mistake on manual code change. The log is at bottom. diff --git a/mm/page_alloc.c b/mm/page_alloc.c index a919ba5cb3c8..cdcd923ec337 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -7779,14 +7779,22 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count, pfn = page_to_pfn(page); for (found = 0, iter = 0; iter < pageblock_nr_pages; iter++) { unsigned long check = pfn + iter; + unsigned long saved_flags; if (!pfn_valid_within(check)) continue; page = pfn_to_page(check); - if (PageReserved(page)) +retry: + saved_flags = READ_ONCE(page->flags); + + + if (PageReserved(page)) { + pr_info("has_unmovable_pages 000: pfn:0x%x\n", pfn+iter); + __dump_page(page, "hotplug"); goto unmovable; + } /* * Hugepages are not in LRU lists, but they're movable. @@ -7795,8 +7803,11 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count, */ if (PageHuge(page)) { - if (!hugepage_migration_supported(page_hstate(page))) + if (!hugepage_migration_supported(page_hstate(page))) { + pr_info("has_unmovable_pages 111: pfn:0x%x\n", pfn+iter); + __dump_page(page, "hotplug"); goto unmovable; + } iter = round_up(iter + 1, 1<<compound_order(page)) - 1; continue; @@ -7824,8 +7835,29 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count, if (__PageMovable(page)) continue; - if (!PageLRU(page)) +#if 0 + if (!PageLRU(page) && (get_pageblock_migratetype(page)!=MIGRATE_MOVABLE) ) found++; +#endif + if (PageLRU(page)) + continue; + + if (PageSwapBacked(page)) + continue; + + + if (page->mapping && !page->mapping->a_ops) + pr_info("page->mapping:%ps \n", page->mapping); + + if (page->mapping && page->mapping->a_ops && page->mapping->a_ops->migratepage) + continue; + + /* + * We might race with the allocation of the page so retry + * if flags have changed. + */ + if (saved_flags != READ_ONCE(page->flags)) + goto retry; /* * If there are RECLAIMABLE pages, we need to check * it. But now, memory offline itself doesn't call @@ -7839,8 +7871,11 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count, * is set to both of a memory hole page and a _used_ kernel * page at boot. */ - if (found > count) + if (++found > count) { + pr_info("has_unmovable_pages: pfn:0x%x, found:0x%x, count:0x%x \n", pfn+iter, found, count); + __dump_page(page, "hotplug"); goto unmovable; + } } return false; unmovable: ***********console log******************* [ 458.584711] Offlined Pages 524288 [ 458.943655] Offlined Pages 524288 [ 459.390757] Offlined Pages 524288 [ 460.086409] Offlined Pages 524288 [ 460.931868] Offlined Pages 524288 [ 461.741327] Offlined Pages 524288 [ 462.576653] Offlined Pages 524288 [ 463.291947] Offlined Pages 524288 [ 464.121980] Offlined Pages 524288 [ 464.869983] Offlined Pages 524288 [ 465.550254] Offlined Pages 524288 [ 466.337934] Offlined Pages 524288 [ 467.143416] Offlined Pages 524288 [ 467.925108] Offlined Pages 524288 [ 468.665318] Offlined Pages 524288 [ 469.473999] Offlined Pages 524288 [ 470.390116] Offlined Pages 524288 [ 471.069104] Offlined Pages 524288 [ 471.704154] Offlined Pages 524288 [ 472.322466] Offlined Pages 524288 [ 472.964513] Offlined Pages 524288 [ 473.629328] Offlined Pages 524288 [ 474.265908] Offlined Pages 524288 [ 474.883829] Offlined Pages 524288 [ 475.538700] Offlined Pages 524288 [ 476.247451] Offlined Pages 524288 [ 476.575516] has_unmovable_pages: pfn:0x10dfec00, found:0x1, count:0x0 [ 476.582103] page:ffffea0437fb0000 count:1 mapcount:1 mapping:ffff880e05239841 index:0x7f26e5000 compound_mapcount: 1 [ 476.592645] flags: 0x5fffffc0090034(uptodate|lru|active|head|swapbacked) [ 476.599386] raw: 005fffffc0090034 ffffea043bd58008 ffffea0437fb8008 ffff880e05239841 [ 476.607154] raw: 00000007f26e5000 0000000000000000 00000001ffffffff ffff880e74f5c000 [ 476.616725] page dumped because: hotplug [ 476.620682] page->mem_cgroup:ffff880e74f5c000 [ 476.625190] WARNING: CPU: 245 PID: 8 at mm/page_alloc.c:7882 has_unmovable_pages.cold.123+0x44/0xb6 [ 476.634230] Modules linked in: vfat fat intel_rapl sb_edac x86_pkg_temp_thermal coretemp kvm_intel kvm irqbypass crct10dif_pclmul iTCO_wdt crc32_pclmul iTCO_vendor_support ghash_clmulni_intel intel_cstate joydev ses ipmi_si enclosure ipmi_devintf scsi_transport_sas intel_uncore ipmi_msghandler pcspkr intel_rapl_perf sg mei_me i2c_i801 mei lpc_ich wmi xfs libcrc32c sd_mod ahci igb crc32c_intel libahci i2c_algo_bit dca libata megaraid_sas dm_mirror dm_region_hash dm_log dm_mod [ 476.678239] CPU: 245 PID: 8 Comm: kworker/u576:0 Not tainted 4.19.0+ #9 [ 476.684871] Hardware name: 9008/IT91SMUB, BIOS BLXSV512 03/22/2018 [ 476.691199] Workqueue: kacpi_hotplug acpi_hotplug_work_fn [ 476.696678] RIP: 0010:has_unmovable_pages.cold.123+0x44/0xb6 [ 476.702369] Code: fe 0f 0e 82 4c 89 ff e8 0f a0 02 00 48 8b 44 24 10 48 2b 40 50 48 89 c2 b8 01 00 00 00 48 81 fa 40 11 00 00 0f 85 ec eb ff ff <0f> 0b e9 e5 eb ff ff 48 89 de 48 c7 c7 08 4d 0a 82 e8 79 1e f0 ff [ 476.721100] RSP: 0018:ffffc900000e3c70 EFLAGS: 00010046 [ 476.726361] RAX: 0000000000000001 RBX: 0000000010dfec00 RCX: 0000000000000006 [ 476.733543] RDX: 0000000000001140 RSI: 0000000000000096 RDI: ffff880e7cb55ad0 [ 476.742768] RBP: 005fffffc0010000 R08: 0000000000000bbf R09: 0000000000000007 [ 476.749926] R10: 0000000000000000 R11: ffffffff829f162d R12: 0000000010dfec00 [ 476.757082] R13: 0000000000000001 R14: 0000000000000000 R15: ffffea0437fb0000 [ 476.764241] FS: 0000000000000000(0000) GS:ffff880e7cb40000(0000) knlGS:0000000000000000 [ 476.772338] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 476.778102] CR2: 00007fc3670f3000 CR3: 0000000e716c8003 CR4: 00000000003606e0 [ 476.785249] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 476.792405] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 476.799562] Call Trace: [ 476.804039] start_isolate_page_range+0x258/0x2f0 [ 476.808823] __offline_pages+0xcc/0x8e0 [ 476.812753] ? klist_next+0xf2/0x100 [ 476.816402] ? device_is_dependent+0x90/0x90 [ 476.820759] memory_subsys_offline+0x40/0x60 [ 476.825127] device_offline+0x81/0xb0 [ 476.828920] acpi_bus_offline+0xdb/0x140 [ 476.832937] acpi_device_hotplug+0x21c/0x460 [ 476.837281] acpi_hotplug_work_fn+0x1a/0x30 [ 476.841562] process_one_work+0x1a1/0x3a0 [ 476.845647] worker_thread+0x30/0x380 [ 476.849381] ? drain_workqueue+0x120/0x120 [ 476.853549] kthread+0x112/0x130 [ 476.856866] ? kthread_park+0x80/0x80 [ 476.860588] ret_from_fork+0x35/0x40 [ 476.864204] ---[ end trace 08fb4fe25cf760b3 ]--- [ 476.955547] has_unmovable_pages: pfn:0x10e07a00, found:0x1, count:0x0 [ 476.962126] page:ffffea04381e8000 count:1 mapcount:1 mapping:ffff880e0913d899 index:0x7f26ec600 compound_mapcount: 1 [ 476.972673] flags: 0x5fffffc0090034(uptodate|lru|active|head|swapbacked) [ 476.979413] raw: 005fffffc0090034 ffffea043c338008 ffffea043f5b0008 ffff880e0913d899 [ 476.987192] raw: 00000007f26ec600 0000000000000000 00000001ffffffff ffff880e74f5c000 [ 476.996921] page dumped because: hotplug [ 477.000880] page->mem_cgroup:ffff880e74f5c000 [ 477.110154] has_unmovable_pages: pfn:0x10e9ee00, found:0x1, count:0x0 [ 477.118626] page:ffffea043a7b8000 count:1 mapcount:1 mapping:ffff880e0c89c2c1 index:0x7f26e5000 compound_mapcount: 1 [ 477.129176] flags: 0x5fffffc0090034(uptodate|lru|active|head|swapbacked) [ 477.135911] raw: 005fffffc0090034 ffffea043b0e0008 ffffea04383e8008 ffff880e0c89c2c1 [ 477.143690] raw: 00000007f26e5000 0000000000000000 00000001ffffffff ffff880e74f5c000 [ 477.151448] page dumped because: hotplug [ 477.155404] page->mem_cgroup:ffff880e74f5c000 [ 477.224784] has_unmovable_pages: pfn:0x10f13600, found:0x1, count:0x0 [ 477.231368] page:ffffea043c4d8000 count:1 mapcount:1 mapping:ffff880e57b7adc1 index:0x7f26e8600 compound_mapcount: 1 [ 477.241922] flags: 0x5fffffc0090034(uptodate|lru|active|head|swapbacked) [ 477.250324] raw: 005fffffc0090034 ffffea043af88008 ffffea043cf20508 ffff880e57b7adc1 [ 477.258089] raw: 00000007f26e8600 0000000000000000 00000001ffffffff ffff880e74f5c000 [ 477.265857] page dumped because: hotplug [ 477.269811] page->mem_cgroup:ffff880e74f5c000 [ 477.307236] has_unmovable_pages: pfn:0x10f8da00, found:0x1, count:0x0 [ 477.313807] page:ffffea043e368000 count:1 mapcount:1 mapping:ffff880e75132529 index:0x7f26e1600 compound_mapcount: 1 [ 477.324361] flags: 0x5fffffc0090034(uptodate|lru|active|head|swapbacked) [ 477.331096] raw: 005fffffc0090034 ffffea043d2c0008 ffffea043ba40008 ffff880e75132529 [ 477.338875] raw: 00000007f26e1600 0000000000000000 00000001ffffffff ffff880e74f5c000 [ 477.346635] page dumped because: hotplug [ 477.350590] page->mem_cgroup:ffff880e74f5c000 [ 477.380478] has_unmovable_pages: pfn:0x10d87200, found:0x1, count:0x0 [ 477.387060] page:ffffea04361c8000 count:1 mapcount:1 mapping:ffff880e0913d899 index:0x7f26e2400 compound_mapcount: 1 [ 477.397610] flags: 0x5fffffc0090034(uptodate|lru|active|head|swapbacked) [ 477.404340] raw: 005fffffc0090034 ffffea0437fb8008 ffffea043cd20008 ffff880e0913d899 [ 477.412113] raw: 00000007f26e2400 0000000000000000 00000001ffffffff ffff880e74f5c000 [ 477.419870] page dumped because: hotplug [ 477.423842] page->mem_cgroup:ffff880e74f5c000 [ 477.435557] memory memory539: Offline failed. [ 489.171077] perf: interrupt took too long (2745 > 2500), lowering kernel.perf_event_max_sample_rate to 72000 [ 501.332276] INFO: NMI handler (ghes_notify_nmi) took too long to run: 2.179 msecs [ 511.073564] perf: interrupt took too long (3593 > 3431), lowering kernel.perf_event_max_sample_rate to 55000 [ 521.050208] INFO: NMI handler (perf_event_nmi_handler) took too long to run: 1.836 msecs [ 521.058339] perf: interrupt took too long (16324 > 4491), lowering kernel.perf_event_max_sample_rate to 12000