* Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > On Wed, Nov 14, 2012 at 12:50 AM, Ingo Molnar <mingo@xxxxxxxxxx> wrote: > > What do you guys think about this mprotect() optimization? > > Hmm.. > > If this is mainly about just avoiding the TLB flushing, I do > wonder if it might not be more interesting to try to be much > more aggressive. > > As noted elsewhere, we should just notice when vm_page_prot > doesn't change at all - even if 'flags' change, it is possible > that the actual low-level page protection bits do not (due to > the X=R issue). > > But even *more* aggressively, how about looking at > > - not flushing the TLB at all if the bits become more permissive > (taking the TLB micro-fault and letting the CPU just update it on its > own) > > - even *more* aggressive: if the bits become strictly more > restrictive, how about not flushing the TLB at all, *and* not > even changing the page tables, and just teaching the page > fault code to do it lazily at fault time? > > Now, the "change protections lazily" might actually be a huge > performance problem with the page fault overhead dwarfing any > TLB flush costs, but we don't really know, do we? It might be > worth trying out. It might be a good idea when ptes get weaker protections - and maybe some CPU models see the pte modification in memory and are able to hash that to the TLB entry already and flush it? Even if they don't guarantee it architecturally they might have it as an optimization that works most of the time. But I'd prefer to keep any such patch separate from these patches and maybe even keep them per arch and per CPU model? I have instrumented and made sure that *these* patches do help visibly - but to determine whether not flushing TLBs when they are made more permissive is a lot harder to do ... there could be per arch differences, even per CPU model differences, depending on TLB size, CPU features, etc. For unthreaded process environments mprotect() is pretty neat already. For small/midsize mprotect()s in threaded environments there's two big costs: - the down_write(mm->sem)/up_write(mm->sem) serializes between threads. Technically this could be improved, as the most expensive parts of mprotect() are really safe via down_read() - the only exception appears to be: vma->vm_flags = newflags; vma->vm_page_prot = pgprot_modify(vma->vm_page_prot, vm_get_page_prot(newflags)); and that could be serialized using a spinlock, say the pagetable lock. But it's a lot of footwork factoring out vma->vm_page_prot users and we'd consider each such place whether slowing them down is less of a problem than the benefit of speeding up mprotect(). So I wouldn't personally go there, dragons and all that. - the TLB flush, if done on some highly threaded workload like a JVM with threads live on many other CPUs is a global TLB flush, with IPIs sent everywhere and the result has to be waited for. This could be improved even if we don't do your very aggressive optimization, unless I'm missing something: we could still flush locally and send the IPIs, but we don't have to *wait* for them when we weaken protections, right? Thanks, Ingo -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>