On Wed, 6 Feb 2013 16:20:40 -0800 (PST) Hugh Dickins <hughd@xxxxxxxxxx> wrote: > Martin, I'd like to say Applauded-by: Hugh Dickins <hughd@xxxxxxxxxx> > but I do have one reservation: the PageDirty business you helpfully > draw attention to in your description above. > > That makes me nervous, having a PageDirty test buried down there in > one architecture's mk_pte(). Particularly since I know the PageDirty > handling on anon/swap pages is rather odd: it works, but it's hard to > justify some of the SetPageDirtys (when we add to swap, AND when we > remove from swap): partly a leftover from 2.4 days, when vmscan worked > differently, and we had to be more careful about freeing modified pages. I tried to solved the whole thing with arch level code only. The PageDirty check in mk_pte is essential to avoid additional protection faults for tmpfs/shmem. > I did a patch a year or two ago, mainly for debugging some particular > issue by announcing "Bad page state" if ever a dirty page is freed, in > which I had to tidy that up. Now, I don't have any immediate intention > to resurrect that patch, but I'm afraid that if I did, I might interfere > with your optimization in s390's mk_pte() without realizing it. > > > --- a/arch/s390/include/asm/page.h > > +++ b/arch/s390/include/asm/page.h > > ... > > @@ -1152,8 +1190,13 > > static inline pte_t mk_pte(struct page *page, pgprot_t pgprot) > > { > > unsigned long physpage = page_to_phys(page); > > + pte_t __pte = mk_pte_phys(physpage, pgprot); > > > > - return mk_pte_phys(physpage, pgprot); > > + if ((pte_val(__pte) & _PAGE_SWW) && PageDirty(page)) { > > + pte_val(__pte) |= _PAGE_SWC; > > + pte_val(__pte) &= ~_PAGE_RO; > > + } > > + return __pte; > > } > > Am I right to think that, once you examine the mk_pte() callsites, > this actually would not be affecting anon pages, nor accounted file > pages, just tmpfs/shmem or ramfs pages read-faulted into a read-write > shared vma? (That fits with what you say above.) That it amounts to > the patch below - which I think I would prefer, because it's explicit? > (There might be one or two other places it makes a difference e.g. > replacing a writable migration entry, but those too uncommon to matter.) Anon page and accounted file pages won't need the mk_pte optimization, that is there for tmpfs/shmem. We could do that in common code as well, to make the dependency on PageDirty more obvious. > --- 3.8-rc6/mm/memory.c 2013-01-09 19:25:05.028321379 -0800 > +++ linux/mm/memory.c 2013-02-06 15:01:17.904387877 -0800 > @@ -3338,6 +3338,10 @@ static int __do_fault(struct mm_struct * > dirty_page = page; > get_page(dirty_page); > } > +#ifdef CONFIG_S390 > + else if (pte_write(entry) && PageDirty(page)) > + pte_mkdirty(entry); > +#endif > } > set_pte_at(mm, address, page_table, entry); > > And then I wonder, is that something we should do on all architectures? > On the one hand, it would save a hardware fault when and if the pte is > dirtied later; on the other hand, it seems wrong to claim pte dirty when > not (though I didn't find anywhere that would care). I don't like the fact that we are adding another CONFIG_S390, if we could pre-dirty the pte for all architectures that would be nice. It has no ill effects for s390 to make the pte dirty, I can think of no reason why it should hurt for other architectures. -- blue skies, Martin. "Reality continues to ruin my life." - Calvin. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>