Re: Dirty/Access bits vs. page content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 24 Apr 2014, Linus Torvalds wrote:
> On Thu, Apr 24, 2014 at 11:40 AM, Hugh Dickins <hughd@xxxxxxxxxx> wrote:
> > safely with page_mkclean(), as it stands at present anyway.
> >
> > I think that (in the exceptional case when a shared file pte_dirty has
> > been encountered, and this mm is active on other cpus) zap_pte_range()
> > needs to flush TLB on other cpus of this mm, just before its
> > pte_unmap_unlock(): then it respects the usual page_mkclean() protocol.
> >
> > Or has that already been rejected earlier in the thread,
> > as too costly for some common case?
> 
> Hmm. The problem is that right now we actually try very hard to batch
> as much as possible in order to avoid extra TLB flushes (we limit it
> to around 10k pages per batch, but that's still a *lot* of pages). The
> TLB flush IPI calls are noticeable under some loads.
> 
> And it's certainly much too much to free 10k pages under a spinlock.
> The latencies would be horrendous.

There is no need to free all the pages immediately after doing the
TLB flush: that's merely how it's structured at present; page freeing
can be left until the end as now, or when out from under the spinlock.

What's sadder, I think, is that we would have to flush TLB for each
page table spanned by the mapping (if other cpus are really active);
but that's still much better batching than what page_mkclean() itself
does (none).

> 
> We could add some special logic that only triggers for the dirty pages
> case, but it would still have to handle the case of "we batched up
> 9000 clean pages, and then we hit a dirty page", so it would get
> rather costly quickly.
> 
> Or we could have a separate array for dirty pages, and limit those to
> a much smaller number, and do just the dirty pages under the lock, and
> then the rest after releasing the lock. Again, a fair amount of new
> complexity.
> 
> I would almost prefer to have some special (per-mapping?) lock or
> something, and make page_mkclean() be serialize with the unmapping
> case.

Yes, that might be a possibility.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]