On Sat, Feb 23, 2019 at 04:04:16PM -0500, Sasha Levin wrote: > From: Michal Hocko <mhocko@xxxxxxxx> > > [ Upstream commit efad4e475c312456edb3c789d0996d12ed744c13 ] There is a fix for this fix [1]. It's commit 891cb2a72d821f930a39d5900cb7a3aa752c1d5b ("mm, memory_hotplug: fix off-by-one in is_pageblock_removable") in mainline. [1] https://lore.kernel.org/lkml/20190218181544.14616-1-mhocko@xxxxxxxxxx/ > Patch series "mm, memory_hotplug: fix uninitialized pages fallouts", v2. > > Mikhail Zaslonko has posted fixes for the two bugs quite some time ago > [1]. I have pushed back on those fixes because I believed that it is > much better to plug the problem at the initialization time rather than > play whack-a-mole all over the hotplug code and find all the places > which expect the full memory section to be initialized. > > We have ended up with commit 2830bf6f05fb ("mm, memory_hotplug: > initialize struct pages for the full memory section") merged and cause a > regression [2][3]. The reason is that there might be memory layouts > when two NUMA nodes share the same memory section so the merged fix is > simply incorrect. > > In order to plug this hole we really have to be zone range aware in > those handlers. I have split up the original patch into two. One is > unchanged (patch 2) and I took a different approach for `removable' > crash. > > [1] http://lkml.kernel.org/r/20181105150401.97287-2-zaslonko@xxxxxxxxxxxxx > [2] https://bugzilla.redhat.com/show_bug.cgi?id=1666948 > [3] http://lkml.kernel.org/r/20190125163938.GA20411@xxxxxxxxxxxxxx > > This patch (of 2): > > Mikhail has reported the following VM_BUG_ON triggered when reading sysfs > removable state of a memory block: > > page:000003d08300c000 is uninitialized and poisoned > page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p)) > Call Trace: > is_mem_section_removable+0xb4/0x190 > show_mem_removable+0x9a/0xd8 > dev_attr_show+0x34/0x70 > sysfs_kf_seq_show+0xc8/0x148 > seq_read+0x204/0x480 > __vfs_read+0x32/0x178 > vfs_read+0x82/0x138 > ksys_read+0x5a/0xb0 > system_call+0xdc/0x2d8 > Last Breaking-Event-Address: > is_mem_section_removable+0xb4/0x190 > Kernel panic - not syncing: Fatal exception: panic_on_oops > > The reason is that the memory block spans the zone boundary and we are > stumbling over an unitialized struct page. Fix this by enforcing zone > range in is_mem_section_removable so that we never run away from a zone. > > Link: http://lkml.kernel.org/r/20190128144506.15603-2-mhocko@xxxxxxxxxx > Signed-off-by: Michal Hocko <mhocko@xxxxxxxx> > Reported-by: Mikhail Zaslonko <zaslonko@xxxxxxxxxxxxx> > Debugged-by: Mikhail Zaslonko <zaslonko@xxxxxxxxxxxxx> > Tested-by: Gerald Schaefer <gerald.schaefer@xxxxxxxxxx> > Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@xxxxxxxxx> > Reviewed-by: Oscar Salvador <osalvador@xxxxxxx> > Cc: Pavel Tatashin <pasha.tatashin@xxxxxxxxxx> > Cc: Heiko Carstens <heiko.carstens@xxxxxxxxxx> > Cc: Martin Schwidefsky <schwidefsky@xxxxxxxxxx> > Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> > Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> > Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx> > --- > mm/memory_hotplug.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c > index 21d94b5677e81..5ce0d929ff482 100644 > --- a/mm/memory_hotplug.c > +++ b/mm/memory_hotplug.c > @@ -1234,7 +1234,8 @@ static bool is_pageblock_removable_nolock(struct page *page) > bool is_mem_section_removable(unsigned long start_pfn, unsigned long nr_pages) > { > struct page *page = pfn_to_page(start_pfn); > - struct page *end_page = page + nr_pages; > + unsigned long end_pfn = min(start_pfn + nr_pages, zone_end_pfn(page_zone(page))); > + struct page *end_page = pfn_to_page(end_pfn); > > /* Check the starting page of each pageblock within the range */ > for (; page < end_page; page = next_active_pageblock(page)) { > -- > 2.19.1 > -- Sincerely yours, Mike.