On Thu, 24 Apr 2014, Linus Torvalds wrote: > On Thu, Apr 24, 2014 at 11:40 AM, Hugh Dickins <hughd@xxxxxxxxxx> wrote: > > safely with page_mkclean(), as it stands at present anyway. > > > > I think that (in the exceptional case when a shared file pte_dirty has > > been encountered, and this mm is active on other cpus) zap_pte_range() > > needs to flush TLB on other cpus of this mm, just before its > > pte_unmap_unlock(): then it respects the usual page_mkclean() protocol. > > > > Or has that already been rejected earlier in the thread, > > as too costly for some common case? > > Hmm. The problem is that right now we actually try very hard to batch > as much as possible in order to avoid extra TLB flushes (we limit it > to around 10k pages per batch, but that's still a *lot* of pages). The > TLB flush IPI calls are noticeable under some loads. > > And it's certainly much too much to free 10k pages under a spinlock. > The latencies would be horrendous. There is no need to free all the pages immediately after doing the TLB flush: that's merely how it's structured at present; page freeing can be left until the end as now, or when out from under the spinlock. What's sadder, I think, is that we would have to flush TLB for each page table spanned by the mapping (if other cpus are really active); but that's still much better batching than what page_mkclean() itself does (none). > > We could add some special logic that only triggers for the dirty pages > case, but it would still have to handle the case of "we batched up > 9000 clean pages, and then we hit a dirty page", so it would get > rather costly quickly. > > Or we could have a separate array for dirty pages, and limit those to > a much smaller number, and do just the dirty pages under the lock, and > then the rest after releasing the lock. Again, a fair amount of new > complexity. > > I would almost prefer to have some special (per-mapping?) lock or > something, and make page_mkclean() be serialize with the unmapping > case. Yes, that might be a possibility. Hugh -- To unsubscribe from this list: send the line "unsubscribe linux-arch" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html