On Tue, Dec 11, 2018 at 04:23:05PM +0100, Zaslonko Mikhail wrote: >Hello, > >On 11.12.2018 02:50, Wei Yang wrote: >> On Mon, Dec 10, 2018 at 05:14:36PM +0100, Zaslonko Mikhail wrote: >>> Hello, >>> >>> On 10.12.2018 16:10, Wei Yang wrote: >>>> On Mon, Dec 10, 2018 at 02:07:12PM +0100, Mikhail Zaslonko wrote: >>>>> If memory end is not aligned with the sparse memory section boundary, the >>>>> mapping of such a section is only partly initialized. This may lead to >>>>> VM_BUG_ON due to uninitialized struct page access from >>>>> is_mem_section_removable() or test_pages_in_a_zone() function triggered by >>>>> memory_hotplug sysfs handlers: >>>>> >>>>> page:000003d082008000 is uninitialized and poisoned >>>>> page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p)) >>>>> Call Trace: >>>>> ([<0000000000385b26>] test_pages_in_a_zone+0xde/0x160) >>>>> [<00000000008f15c4>] show_valid_zones+0x5c/0x190 >>>>> [<00000000008cf9c4>] dev_attr_show+0x34/0x70 >>>>> [<0000000000463ad0>] sysfs_kf_seq_show+0xc8/0x148 >>>>> [<00000000003e4194>] seq_read+0x204/0x480 >>>>> [<00000000003b53ea>] __vfs_read+0x32/0x178 >>>>> [<00000000003b55b2>] vfs_read+0x82/0x138 >>>>> [<00000000003b5be2>] ksys_read+0x5a/0xb0 >>>>> [<0000000000b86ba0>] system_call+0xdc/0x2d8 >>>>> Last Breaking-Event-Address: >>>>> [<0000000000385b26>] test_pages_in_a_zone+0xde/0x160 >>>>> Kernel panic - not syncing: Fatal exception: panic_on_oops >>>>> >>>>> Fix the problem by initializing the last memory section of the highest zone >>>>> in memmap_init_zone() till the very end, even if it goes beyond the zone >>>>> end. >>>>> >>>>> Signed-off-by: Mikhail Zaslonko <zaslonko@xxxxxxxxxxxxx> >>>>> Reviewed-by: Gerald Schaefer <gerald.schaefer@xxxxxxxxxx> >>>>> Cc: <stable@xxxxxxxxxxxxxxx> >>>>> --- >>>>> mm/page_alloc.c | 15 +++++++++++++++ >>>>> 1 file changed, 15 insertions(+) >>>>> >>>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >>>>> index 2ec9cc407216..41ef5508e5f1 100644 >>>>> --- a/mm/page_alloc.c >>>>> +++ b/mm/page_alloc.c >>>>> @@ -5542,6 +5542,21 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone, >>>>> cond_resched(); >>>>> } >>>>> } >>>>> +#ifdef CONFIG_SPARSEMEM >>>>> + /* >>>>> + * If there is no zone spanning the rest of the section >>>>> + * then we should at least initialize those pages. Otherwise we >>>>> + * could blow up on a poisoned page in some paths which depend >>>>> + * on full sections being initialized (e.g. memory hotplug). >>>>> + */ >>>>> + if (end_pfn == max_pfn) { >>>>> + while (end_pfn % PAGES_PER_SECTION) { >>>>> + __init_single_page(pfn_to_page(end_pfn), end_pfn, zone, >>>>> + nid); >>>>> + end_pfn++; >>>>> + } >>>>> + } >>>>> +#endif >>>> >>>> If my understanding is correct, end_pfn is not a valid range. >>>> >>>> memmap_init_zone() initialize the range [start_pfn, start_pfn + size). I >>>> am afraid this will break the syntax. >>>> >>>> And max_pfn is also not a valid one. For example, on x86, >>> I used pfn_max here to check for the highest zone. What would be a better way? >>> >>>> update_end_of_memory_vars() will update max_pfn, which is calculated by: >>>> >>>> end_pfn = PFN_UP(start + size); >>>> >>>> BTW, as you mentioned this apply to hotplug case. And then why this couldn't >>>> happen during boot up? What differ these two cases? >>> >>> Well, the pages left uninitialized during bootup (initial problem), but the panic itself takes >>> place when we try to process memory_hotplug sysfs attributes (thus triggering sysfs handlers). >>> You can find more details in the original thread: >>> https://marc.info/?t=153658306400001&r=1&w=2 >>> >> >> Thanks. >> >> I took a look into the original thread and try to reproduce this on x86. >> >> My step is: >> >> 1. config page_poisoning >> 2. use kernel parameter mem=3075M >> 3. cat the last memory block device sysfs file removable >> eg. when mem is 3075, cat memory9/removable >> >> I don't see the Call trace. Do I miss something to reproduce it? >> > >No you don't. I guess there might be deviations depending on the architecture (I am on s390). >As I understand, memory block size is 384 Mb on your system and memory9 is the last block on the list? Sorry, my calculation is not correct. The last memory_block is 23 instead of 9. >BTW, do you have CONFIG_DEBUG_VM and CONFIG_DEBUG_VM_PGFLAGS on? > Yes, I have set it: CONFIG_DEBUG_VM=y CONFIG_DEBUG_VM_PGFLAGS=y And the kernel cmdline is: BOOT_IMAGE=/vmlinuz-4.20.0-rc5+ root=UUID=98aa84d6-9ba6-4033-ab91-9ca6fe3dd74f ro \ resume=UUID=b7c21053-d9c1-4e58-8488-7d385f8ee107 console=ttyS0 \ LANG=en_US.UTF-8 mem=3075M > >>>> >>>>> } >>>>> >>>>> #ifdef CONFIG_ZONE_DEVICE >>>>> -- >>>>> 2.16.4 >>>> >>> >>> Thanks, >>> Mikhail Zaslonko >> -- Wei Yang Help you, Help me