On Thu, Apr 24, 2014 at 11:40 AM, Hugh Dickins <hughd@xxxxxxxxxx> wrote: > safely with page_mkclean(), as it stands at present anyway. > > I think that (in the exceptional case when a shared file pte_dirty has > been encountered, and this mm is active on other cpus) zap_pte_range() > needs to flush TLB on other cpus of this mm, just before its > pte_unmap_unlock(): then it respects the usual page_mkclean() protocol. > > Or has that already been rejected earlier in the thread, > as too costly for some common case? Hmm. The problem is that right now we actually try very hard to batch as much as possible in order to avoid extra TLB flushes (we limit it to around 10k pages per batch, but that's still a *lot* of pages). The TLB flush IPI calls are noticeable under some loads. And it's certainly much too much to free 10k pages under a spinlock. The latencies would be horrendous. We could add some special logic that only triggers for the dirty pages case, but it would still have to handle the case of "we batched up 9000 clean pages, and then we hit a dirty page", so it would get rather costly quickly. Or we could have a separate array for dirty pages, and limit those to a much smaller number, and do just the dirty pages under the lock, and then the rest after releasing the lock. Again, a fair amount of new complexity. I would almost prefer to have some special (per-mapping?) lock or something, and make page_mkclean() be serialize with the unmapping case. Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>