Re: Dirty/Access bits vs. page content

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> · Thu, 24 Apr 2014 12:45:36 -0700

On Thu, Apr 24, 2014 at 11:40 AM, Hugh Dickins <hughd@xxxxxxxxxx> wrote:
> safely with page_mkclean(), as it stands at present anyway.
>
> I think that (in the exceptional case when a shared file pte_dirty has
> been encountered, and this mm is active on other cpus) zap_pte_range()
> needs to flush TLB on other cpus of this mm, just before its
> pte_unmap_unlock(): then it respects the usual page_mkclean() protocol.
>
> Or has that already been rejected earlier in the thread,
> as too costly for some common case?

Hmm. The problem is that right now we actually try very hard to batch
as much as possible in order to avoid extra TLB flushes (we limit it
to around 10k pages per batch, but that's still a *lot* of pages). The
TLB flush IPI calls are noticeable under some loads.

And it's certainly much too much to free 10k pages under a spinlock.
The latencies would be horrendous.

We could add some special logic that only triggers for the dirty pages
case, but it would still have to handle the case of "we batched up
9000 clean pages, and then we hit a dirty page", so it would get
rather costly quickly.

Or we could have a separate array for dirty pages, and limit those to
a much smaller number, and do just the dirty pages under the lock, and
then the rest after releasing the lock. Again, a fair amount of new
complexity.

I would almost prefer to have some special (per-mapping?) lock or
something, and make page_mkclean() be serialize with the unmapping
case.

              Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>