On Wed, Aug 29, 2018 at 11:16:30AM -0400, Masayoshi Mizuma wrote: > Hi Horiguchi-san and Pavel > > Thank you for your comments! > The Pavel's additional patch looks good to me, so I will add it to this series. > > However, unfortunately, the movable_node option has something wrong yet... > When I offline the memory which belongs to movable zone, I got the following > warning. I'm trying to debug it. > > I try to describe the issue as following. > If you have any comments, please let me know. > > WARNING: CPU: 156 PID: 25611 at mm/page_alloc.c:7730 has_unmovable_pages+0x1bf/0x200 > RIP: 0010:has_unmovable_pages+0x1bf/0x200 > ... > Call Trace: > is_mem_section_removable+0xd3/0x160 > show_mem_removable+0x8e/0xb0 > dev_attr_show+0x1c/0x50 > sysfs_kf_seq_show+0xb3/0x110 > seq_read+0xee/0x480 > __vfs_read+0x36/0x190 > vfs_read+0x89/0x130 > ksys_read+0x52/0xc0 > do_syscall_64+0x5b/0x180 > entry_SYSCALL_64_after_hwframe+0x44/0xa9 > RIP: 0033:0x7fe7b7823f70 > ... > > I added a printk to catch the unmovable page. > --- > @@ -7713,8 +7719,12 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count, > * is set to both of a memory hole page and a _used_ kernel > * page at boot. > */ > - if (found > count) > + if (found > count) { > + pr_info("DEBUG: %s zone: %lx page: %lx pfn: %lx flags: %lx found: %ld count: %ld \n", > + __func__, zone, page, page_to_pfn(page), page->flags, found, count); > goto unmovable; > + } > --- > > Then I got the following. The page (PFN: 0x1c0ff130d) flag is > 0xdfffffc0040048 (uptodate|active|swapbacked) > > --- > DEBUG: has_unmovable_pages zone: 0xffff8c0ffff80380 page: 0xffffea703fc4c340 pfn: 0x1c0ff130d flags: 0xdfffffc0040048 found: 1 count: 0 > --- > > And I got the owner from /sys/kernel/debug/page_owner. > > Page allocated via order 0, mask 0x6280ca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO) > PFN 7532909325 type Movable Block 14712713 type Movable Flags 0xdfffffc0040048(uptodate|active|swapbacked) > __alloc_pages_nodemask+0xfc/0x270 > alloc_pages_vma+0x7c/0x1e0 > handle_pte_fault+0x399/0xe50 > __handle_mm_fault+0x38e/0x520 > handle_mm_fault+0xdc/0x210 > __do_page_fault+0x243/0x4c0 > do_page_fault+0x31/0x130 > page_fault+0x1e/0x30 > > The page is allocated as anonymous page via page fault. > I'm not sure, but lru flag should be added to the page...? There is a small window of no PageLRU flag just after page allocation until the page is linked to some LRU list. This kind of unmovability is transient, so retrying can work. I guess that this warning seems to be visible since commit 15c30bc09085 ("mm, memory_hotplug: make has_unmovable_pages more robust") which turned off the optimization based on the assumption that pages under ZONE_MOVABLE are always movable. I think that it helps developers find the issue that permanently unmovable pages are accidentally located in ZONE_MOVABLE zone. But even ZONE_MOVABLE zone could have transiently unmovable pages, so the reported warning seems to me a false charge and should be avoided. Doing lru_add_drain_all()/drain_all_pages() before has_unmovable_pages() might be helpful? Thanks, Naoya Horiguchi