On Sun, Apr 27, 2014 at 01:09:54PM -0700, Hugh Dickins wrote: > On Sun, 27 Apr 2014, Hugh Dickins wrote: > > > > But woke with a panic attack that we have overlooked the question > > of how page reclaim's page_mapped() checks are serialized. > > Perhaps this concern will evaporate with the morning dew, > > perhaps it will not... > > It was a real concern, but we happen to be rescued by the innocuous- > looking is_page_cache_freeable() check at the beginning of pageout(): > which will deserve its own comment, but that can follow later. > > My concern was with page reclaim's shrink_page_list() racing against > munmap's or exit's (or madvise's) zap_pte_range() unmapping the page. > > Once zap_pte_range() has cleared the pte from a vma, neither > try_to_unmap() nor page_mkclean() will see that vma as containing > the page, so neither will do its own flush TLB of the cpus involved, > before proceeding to writepage. > > Linus's patch (serialializing with ptlock) or my patch (serializing > with i_mmap_mutex) both almost fix that, but it seemed not entirely: > because try_to_unmap() is only called when page_mapped(), and > page_mkclean() quits early without taking locks when !page_mapped(). Argh!! very good spotting that. > So in the interval when zap_pte_range() has brought page_mapcount() > down to 0, but not yet flushed TLB on all mapping cpus, it looked as > if we still had a problem - neither try_to_unmap() nor page_mkclean() > would take the lock either of us rely upon for serialization. > > But pageout()'s preliminary is_page_cache_freeable() check makes > it safe in the end: although page_mapcount() has gone down to 0, > page_count() remains raised until the free_pages_and_swap_cache() > after the TLB flush. > > So I now believe we're safe after all with either patch, and happy > for Linus to go ahead with his. OK, so I'm just not seeing that atm. Will have another peek later, hopefully when more fully awake. > Peter, returning at last to your question of whether we could exempt > shmem from the added overhead of either patch. Until just now I > thought not, because of the possibility that the shmem_writepage() > could occur while one of the mm's cpus remote from zap_pte_range() > cpu was still modifying the page. But now that I see the role > played by is_page_cache_freeable(), and of course the zapping end > has never dropped its reference on the page before the TLB flush, > however late that occurred, hmmm, maybe yes, shmem can be exempted. > > But I'd prefer to dwell on that a bit longer: we can add that as > an optimization later if it holds up to scrutiny. For sure.. No need to rush that. And if a (performance) regression shows up in the meantime, we immediately have a good test case too :-) -- To unsubscribe from this list: send the line "unsubscribe linux-arch" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html