On 11/02/2017 09:32 AM, Michal Hocko wrote:
On Tue 31-10-17 11:50:02, Pavel Tatashin wrote:
[...]
The problem happens in this path:
page_alloc_init_late
deferred_init_memmap
deferred_init_range
__def_free
deferred_free_range
__free_pages_boot_core(page, order)
__free_pages()
__free_pages_ok()
free_one_page()
__free_one_page(page, pfn, zone, order, migratetype);
deferred_init_range() initializes one page at a time by calling
__init_single_page(), once it initializes pageblock_nr_pages pages, it
calls deferred_free_range() to free the initialized pages to the buddy
allocator. Eventually, we reach __free_one_page(), where we compute buddy
page:
buddy_pfn = __find_buddy_pfn(pfn, order);
buddy = page + (buddy_pfn - pfn);
buddy_pfn is computed as pfn ^ (1 << order), or pfn + pageblock_nr_pages.
Thefore, buddy page becomes a page one after the range that currently was
initialized, and we access this page in this function. Also, later when we
return back to deferred_init_range(), the buddy page is initialized again.
So, in order to avoid this issue, we must initialize the buddy page prior
to calling deferred_free_range().
How come we didn't have this problem previously? I am really confused.
Hi Michal,
Previously as before my project? That is because memory for all struct
pages was always zeroed in memblock, and in __free_one_page()
page_is_buddy() was always returning false, thus we never tried to
incorrectly remove it from the list:
837 list_del(&buddy->lru);
Now, that memory is not zeroed, page_is_buddy() can return true after
kexec when memory is dirty (unfortunately memset(1) with CONFIG_VM_DEBUG
does not catch this case). And proceed further to incorrectly remove
buddy from the list.
This is why we must initialize the computed buddy page beforehand.
Pasha
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>