Re: Dirty/Access bits vs. page content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 24 Apr 2014, Linus Torvalds wrote:
> On Thu, Apr 24, 2014 at 11:40 AM, Hugh Dickins <hughd@xxxxxxxxxx> wrote:
> > safely with page_mkclean(), as it stands at present anyway.
> >
> > I think that (in the exceptional case when a shared file pte_dirty has
> > been encountered, and this mm is active on other cpus) zap_pte_range()
> > needs to flush TLB on other cpus of this mm, just before its
> > pte_unmap_unlock(): then it respects the usual page_mkclean() protocol.
> >
> > Or has that already been rejected earlier in the thread,
> > as too costly for some common case?
> 
> Hmm. The problem is that right now we actually try very hard to batch
> as much as possible in order to avoid extra TLB flushes (we limit it
> to around 10k pages per batch, but that's still a *lot* of pages). The
> TLB flush IPI calls are noticeable under some loads.
> 
> And it's certainly much too much to free 10k pages under a spinlock.
> The latencies would be horrendous.

There is no need to free all the pages immediately after doing the
TLB flush: that's merely how it's structured at present; page freeing
can be left until the end as now, or when out from under the spinlock.

What's sadder, I think, is that we would have to flush TLB for each
page table spanned by the mapping (if other cpus are really active);
but that's still much better batching than what page_mkclean() itself
does (none).

> 
> We could add some special logic that only triggers for the dirty pages
> case, but it would still have to handle the case of "we batched up
> 9000 clean pages, and then we hit a dirty page", so it would get
> rather costly quickly.
> 
> Or we could have a separate array for dirty pages, and limit those to
> a much smaller number, and do just the dirty pages under the lock, and
> then the rest after releasing the lock. Again, a fair amount of new
> complexity.
> 
> I would almost prefer to have some special (per-mapping?) lock or
> something, and make page_mkclean() be serialize with the unmapping
> case.

Yes, that might be a possibility.

Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-arch" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel]     [Kernel Newbies]     [x86 Platform Driver]     [Netdev]     [Linux Wireless]     [Netfilter]     [Bugtraq]     [Linux Filesystems]     [Yosemite Discussion]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]

  Powered by Linux