On Sun, 27 Apr 2014, Hugh Dickins wrote: > > But woke with a panic attack that we have overlooked the question > of how page reclaim's page_mapped() checks are serialized. > Perhaps this concern will evaporate with the morning dew, > perhaps it will not... It was a real concern, but we happen to be rescued by the innocuous- looking is_page_cache_freeable() check at the beginning of pageout(): which will deserve its own comment, but that can follow later. My concern was with page reclaim's shrink_page_list() racing against munmap's or exit's (or madvise's) zap_pte_range() unmapping the page. Once zap_pte_range() has cleared the pte from a vma, neither try_to_unmap() nor page_mkclean() will see that vma as containing the page, so neither will do its own flush TLB of the cpus involved, before proceeding to writepage. Linus's patch (serialializing with ptlock) or my patch (serializing with i_mmap_mutex) both almost fix that, but it seemed not entirely: because try_to_unmap() is only called when page_mapped(), and page_mkclean() quits early without taking locks when !page_mapped(). So in the interval when zap_pte_range() has brought page_mapcount() down to 0, but not yet flushed TLB on all mapping cpus, it looked as if we still had a problem - neither try_to_unmap() nor page_mkclean() would take the lock either of us rely upon for serialization. But pageout()'s preliminary is_page_cache_freeable() check makes it safe in the end: although page_mapcount() has gone down to 0, page_count() remains raised until the free_pages_and_swap_cache() after the TLB flush. So I now believe we're safe after all with either patch, and happy for Linus to go ahead with his. Peter, returning at last to your question of whether we could exempt shmem from the added overhead of either patch. Until just now I thought not, because of the possibility that the shmem_writepage() could occur while one of the mm's cpus remote from zap_pte_range() cpu was still modifying the page. But now that I see the role played by is_page_cache_freeable(), and of course the zapping end has never dropped its reference on the page before the TLB flush, however late that occurred, hmmm, maybe yes, shmem can be exempted. But I'd prefer to dwell on that a bit longer: we can add that as an optimization later if it holds up to scrutiny. Hugh -- To unsubscribe from this list: send the line "unsubscribe linux-arch" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html