On Thu, Jun 1, 2017 at 8:11 AM, Jerome Glisse <jglisse@xxxxxxxxxx> wrote: > On Thu, Jun 01, 2017 at 07:38:25AM -0700, Andy Lutomirski wrote: >> On Thu, Jun 1, 2017 at 7:33 AM, Jerome Glisse <jglisse@xxxxxxxxxx> wrote: >> > On Thu, Jun 01, 2017 at 06:59:04AM -0700, Andy Lutomirski wrote: >> >> On Wed, May 31, 2017 at 8:03 AM, Jérôme Glisse <jglisse@xxxxxxxxxx> wrote: >> >> > Since af2cf278ef4f ("Don't remove PGD entries in remove_pagetable()") >> >> > we no longer cleanup stall pgd entries and thus the BUG_ON() inside >> >> > sync_global_pgds() is wrong. >> >> > >> >> > This patch remove the BUG_ON() and unconditionaly update stall pgd >> >> > entries. >> >> > >> >> > Signed-off-by: Jérôme Glisse <jglisse@xxxxxxxxxx> >> >> > Cc: Ingo Molnar <mingo@xxxxxxxxxx> >> >> > Cc: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx> >> >> > --- >> >> > arch/x86/mm/init_64.c | 7 +------ >> >> > 1 file changed, 1 insertion(+), 6 deletions(-) >> >> > >> >> > diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c >> >> > index ff95fe8..36b9020 100644 >> >> > --- a/arch/x86/mm/init_64.c >> >> > +++ b/arch/x86/mm/init_64.c >> >> > @@ -123,12 +123,7 @@ void sync_global_pgds(unsigned long start, unsigned long end) >> >> > pgt_lock = &pgd_page_get_mm(page)->page_table_lock; >> >> > spin_lock(pgt_lock); >> >> > >> >> > - if (!p4d_none(*p4d_ref) && !p4d_none(*p4d)) >> >> > - BUG_ON(p4d_page_vaddr(*p4d) >> >> > - != p4d_page_vaddr(*p4d_ref)); >> >> > - >> >> > - if (p4d_none(*p4d)) >> >> > - set_p4d(p4d, *p4d_ref); >> >> > + set_p4d(p4d, *p4d_ref); >> >> >> >> If we have a mismatch in the vmalloc range, vmalloc_fault is going to >> >> screw up and we'll end up using incorrect page tables. >> >> >> >> What's causing the mismatch? If you're hitting this BUG in practice, >> >> I suspect we have a bug elsewhere. >> > >> > No bug elsewhere, simply hotplug memory then hotremove same memory you >> > just hotplugged then hotplug it again and you will trigger this as on >> > the first hotplug we allocate p4d/pud for the struct pages area, then on >> > hot remove we free that memory and clear the p4d/pud in the mm_init pgd >> > but not in any of the other pgds. >> >> That sounds like a bug to me. Either we should remove the stale >> entries and fix all the attendant races, or we should unconditionally >> allocate second-highest-level kernel page tables in unremovable memory >> and never free them. I prefer the latter even though it's slightly >> slower. >> >> > So at that point the next hotplug >> > will trigger the BUG because of stall entries from the first hotplug. >> >> By the time we have a pgd with an entry pointing off into the woods, >> we've already lost. Removing the BUG just hides the problem. > > Then why did you sign of on af2cf278ef4f9289f88504c3e03cb12f76027575 > this is what introduced this change in behavior. > > If i understand you correctly you want to avoid deallocating p4d/pud > directory page when hotremove happen ? But this happen in common non > arch specific code vmemmap_populate_basepages() thought we can make > x86 vmemmap_populate() code arch different thought. > > So i am not sure how to proceed here. My first attempt was to undo > af2cf278ef4f9289f88504c3e03cb12f76027575 so that we keep all pgds > synchronize. No if we want special case p4d/pud allocation that's > a different approach all together. > The intent of that patch was to leave the pud table allocated and to leave the pgd entry in place. The code appears to do that. I'm not terribly familiar with memory hotplug. Where's the problematic code? I suspect there's a fairly simple bug somewhere that needs fixing. kernel_physical_mapping_init() looks right, though. --Andy -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href