On Tue, Mar 24, 2020 at 10:17:13AM -0700, Yang Shi wrote: > > > On 3/19/20 3:49 AM, Kirill A. Shutemov wrote: > > On Wed, Mar 18, 2020 at 10:39:21PM -0700, Yang Shi wrote: > > > > > > On 3/18/20 5:55 PM, Yang Shi wrote: > > > > > > > > On 3/18/20 5:12 PM, Kirill A. Shutemov wrote: > > > > > On Thu, Mar 19, 2020 at 07:19:42AM +0800, Yang Shi wrote: > > > > > > When khugepaged collapses anonymous pages, the base pages would > > > > > > be freed > > > > > > via pagevec or free_page_and_swap_cache(). But, the anonymous page may > > > > > > be added back to LRU, then it might result in the below race: > > > > > > > > > > > > CPU A CPU B > > > > > > khugepaged: > > > > > > unlock page > > > > > > putback_lru_page > > > > > > add to lru > > > > > > page reclaim: > > > > > > isolate this page > > > > > > try_to_unmap > > > > > > page_remove_rmap <-- corrupt _mapcount > > > > > > > > > > > > It looks nothing would prevent the pages from isolating by reclaimer. > > > > > Hm. Why should it? > > > > > > > > > > try_to_unmap() doesn't exclude parallel page unmapping. _mapcount is > > > > > protected by ptl. And this particular _mapcount pin is reachable for > > > > > reclaim as it's not part of usual page table tree. Basically > > > > > try_to_unmap() will never succeeds until we give up the _mapcount on > > > > > khugepaged side. > > > > I don't quite get. What does "not part of usual page table tree" means? > > > > > > > > How's about try_to_unmap() acquires ptl before khugepaged? > > The page table we are dealing with was detached from the process' page > > table tree: see pmdp_collapse_flush(). try_to_unmap() will not see the > > pte. > > A follow-up question here. pmdp_collapse_flush() clears pmd entry and does > TLB shootdown on x86. I'm supposed the main purpose is to serialize fast gup > since it doesn't acquire any lock (mmap_sem, ptl ,etc), but disable > interrupt so the TLB shootdown IPI would get blocked. This could guarantee > synchronization on x86, but it looks not all architectures do TLB shootdown > or implement it via IPI, so how they could serialize with fast gup? The main purpose of pmdp_collapse_flush() is to block access to pages under collapse, including access via GUP (and its variants). It's up to architecture to implement it correctly, including TLB flush vs. GUP_fast serialization. Genetic way works fine for most architectures. Notable exceptions are Power and S390. > In addition it looks acquiring pmd lock is not necessary. Before both write > mmap_sem and write anon_vma lock are acquired which could serialize page > fault and rmap walk, so it looks fast gup is the only one which could run > concurrently, but fast gup doesn't acquire ptl at all. It seems the > pmd_lock/unlock could be removed. This is likely true. And we have a comment there. But taking uncontended lock is check, so why not. -- Kirill A. Shutemov