On Wed, Nov 29, 2017 at 11:33:07AM +0100, Peter Zijlstra wrote: > static inline void __native_flush_tlb(void) > { > + flush_user_asid(this_cpu_read(cpu_tlbstate.loaded_mm_asid)); > > /* > + * If current->mm == NULL then we borrow a mm > + * which may change during a task switch and > + * therefore we must not be preempted while we > + * write CR3 back: > */ > + preempt_disable(); > + native_write_cr3(__native_read_cr3()); > + preempt_enable(); > + /* > + * Does not need tlb_flush_shared_nonglobals() > + * since the CR3 write without PCIDs flushes all > + * non-globals. > + */ > + return; OK, so seeing that comment today made me realize I had so far failed to audit the whole flush user vs flush kernel thing. In short the above comment is complete crap. > } The longer story is that: flush_tlb_all() flush_tlb_kernel_range() need to flush kernel pages and thus flush _all_ the (kernel) ASIDs. Whereas: flush_tlb_mm() flush_tlb_range() flush_tlb_page() Only flush user pages, and thus only need to flush the respective user and kernel ASID. The last 3 all map to flush_tlb_mm_range() which, through flush_tlb_func_{local,remote} ends up in flush_tlb_func_common(), which then uses either __flush_tlb() or __flush_tlb_single(). Both __flush_tlb() (the above function) and __flush_tlb_single() only (need to) flush the 2 ASIDs that contain the user mapping. Now the problem is that flush_tlb_kernel_range() is implemented using either __flush_tlb_all() or __flush_tlb_single(), and it is that last use that is buggered. So at the very least we need the below to cure things, but there is another inconsistency; do_flush_tlb_all() is used by both flush_tlb_all() and flush_tlb_kernel_range() and increments NR_TLB_*, do_kernel_range_flush() OTOH does not increment NR_TLB_*. I'm not fixing that, but I'll leave a comment around or something, so we can later try and figure out what exact statistics we want. --- diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h index 9587722162ee..ccaf6e126582 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -388,12 +388,6 @@ static inline void __native_flush_tlb(void) preempt_disable(); native_write_cr3(__native_read_cr3()); preempt_enable(); - /* - * Does not need tlb_flush_shared_nonglobals() - * since the CR3 write without PCIDs flushes all - * non-globals. - */ - return; } static inline void __native_flush_tlb_global_irq_disabled(void) diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 122c48fa6012..24bd86118b46 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -609,6 +609,8 @@ static void do_kernel_range_flush(void *info) /* flush range by one by one 'invlpg' */ for (addr = f->start; addr < f->end; addr += PAGE_SIZE) __flush_tlb_single(addr); + + tlb_flush_shared_nonglobals(); } void flush_tlb_kernel_range(unsigned long start, unsigned long end) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>