On Thu, Jun 1, 2017 at 7:33 AM, Jerome Glisse <jglisse@xxxxxxxxxx> wrote: > On Thu, Jun 01, 2017 at 06:59:04AM -0700, Andy Lutomirski wrote: >> On Wed, May 31, 2017 at 8:03 AM, Jérôme Glisse <jglisse@xxxxxxxxxx> wrote: >> > Since af2cf278ef4f ("Don't remove PGD entries in remove_pagetable()") >> > we no longer cleanup stall pgd entries and thus the BUG_ON() inside >> > sync_global_pgds() is wrong. >> > >> > This patch remove the BUG_ON() and unconditionaly update stall pgd >> > entries. >> > >> > Signed-off-by: Jérôme Glisse <jglisse@xxxxxxxxxx> >> > Cc: Ingo Molnar <mingo@xxxxxxxxxx> >> > Cc: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx> >> > --- >> > arch/x86/mm/init_64.c | 7 +------ >> > 1 file changed, 1 insertion(+), 6 deletions(-) >> > >> > diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c >> > index ff95fe8..36b9020 100644 >> > --- a/arch/x86/mm/init_64.c >> > +++ b/arch/x86/mm/init_64.c >> > @@ -123,12 +123,7 @@ void sync_global_pgds(unsigned long start, unsigned long end) >> > pgt_lock = &pgd_page_get_mm(page)->page_table_lock; >> > spin_lock(pgt_lock); >> > >> > - if (!p4d_none(*p4d_ref) && !p4d_none(*p4d)) >> > - BUG_ON(p4d_page_vaddr(*p4d) >> > - != p4d_page_vaddr(*p4d_ref)); >> > - >> > - if (p4d_none(*p4d)) >> > - set_p4d(p4d, *p4d_ref); >> > + set_p4d(p4d, *p4d_ref); >> >> If we have a mismatch in the vmalloc range, vmalloc_fault is going to >> screw up and we'll end up using incorrect page tables. >> >> What's causing the mismatch? If you're hitting this BUG in practice, >> I suspect we have a bug elsewhere. > > No bug elsewhere, simply hotplug memory then hotremove same memory you > just hotplugged then hotplug it again and you will trigger this as on > the first hotplug we allocate p4d/pud for the struct pages area, then on > hot remove we free that memory and clear the p4d/pud in the mm_init pgd > but not in any of the other pgds. That sounds like a bug to me. Either we should remove the stale entries and fix all the attendant races, or we should unconditionally allocate second-highest-level kernel page tables in unremovable memory and never free them. I prefer the latter even though it's slightly slower. > So at that point the next hotplug > will trigger the BUG because of stall entries from the first hotplug. By the time we have a pgd with an entry pointing off into the woods, we've already lost. Removing the BUG just hides the problem. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href