On Thu, May 09, 2019 at 11:35:55AM -0700, Yang Shi wrote: > > > On 5/9/19 3:54 AM, Peter Zijlstra wrote: > > On Thu, May 09, 2019 at 12:38:13PM +0200, Peter Zijlstra wrote: > > > > > That's tlb->cleared_p*, and yes agreed. That is, right until some > > > architecture has level dependent TLBI instructions, at which point we'll > > > need to have them all set instead of cleared. > > > Anyway; am I correct in understanding that the actual problem is that > > > we've cleared freed_tables and the ARM64 tlb_flush() will then not > > > invalidate the cache and badness happens? > > > > > > Because so far nobody has actually provided a coherent description of > > > the actual problem we're trying to solve. But I'm thinking something > > > like the below ought to do. > > There's another 'fun' issue I think. For architectures like ARM that > > have range invalidation and care about VM_EXEC for I$ invalidation, the > > below doesn't quite work right either. > > > > I suspect we also have to force: tlb->vma_exec = 1. > > Isn't the below code in tlb_flush enough to guarantee this? > > ... > } else if (tlb->end) { > struct vm_area_struct vma = { > .vm_mm = tlb->mm, > .vm_flags = (tlb->vma_exec ? VM_EXEC : 0) | > (tlb->vma_huge ? VM_HUGETLB : 0), > }; Only when vma_exec is actually set... and there is no guarantee of that in the concurrent path (the last VMA we iterate might not be executable, but the TLBI we've missed might have been). More specific, the 'fun' case is if we have no present page in the whole executable page, in that case tlb->end == 0 and we never call into the arch code, never giving it chance to flush I$.