On Thu, Jul 18, 2019 at 2:17 AM Joerg Roedel <jroedel@xxxxxxx> wrote: > > Hi Andy, > > On Wed, Jul 17, 2019 at 02:24:09PM -0700, Andy Lutomirski wrote: > > On Wed, Jul 17, 2019 at 12:14 AM Joerg Roedel <joro@xxxxxxxxxx> wrote: > > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > > > index 4fa8d84599b0..322b11a374fd 100644 > > > --- a/mm/vmalloc.c > > > +++ b/mm/vmalloc.c > > > @@ -132,6 +132,8 @@ static void vunmap_page_range(unsigned long addr, unsigned long end) > > > continue; > > > vunmap_p4d_range(pgd, addr, next); > > > } while (pgd++, addr = next, addr != end); > > > + > > > + vmalloc_sync_all(); > > > } > > > > I'm confused. Shouldn't the code in _vm_unmap_aliases handle this? > > As it stands, won't your patch hurt performance on x86_64? If x86_32 > > is a special snowflake here, maybe flush_tlb_kernel_range() should > > handle this? > > Imo this is the logical place to handle this. The code first unmaps the > area from the init_mm page-table and then syncs that page-table to all > other page-tables in the system, so one place to update the page-tables. I find it problematic that there is no meaningful documentation as to what vmalloc_sync_all() is supposed to do. The closest I can find is this comment by following the x86_64 code, which calls sync_global_pgds(), which says: /* * When memory was added make sure all the processes MM have * suitable PGD entries in the local PGD level page. */ void sync_global_pgds(unsigned long start, unsigned long end) { Which is obviously entirely inapplicable. If I'm understanding correctly, the underlying issue here is that the vmalloc fault mechanism can propagate PGD entry *addition*, but nothing (not even flush_tlb_kernel_range()) propagates PGD entry *removal*. I find it suspicious that only x86 has this. How do other architectures handle this? At the very least, I think this series needs a comment in vmalloc_sync_all() explaining exactly what the function promises to do. But maybe a better fix is to add code to flush_tlb_kernel_range() to sync the vmalloc area if the flushed range overlaps the vmalloc area. Or, even better, improve x86_32 the way we did x86_64: adjust the memory mapping code such that top-level paging entries are never deleted in the first place.