On Wed, Mar 18, 2020 at 10:39:21PM -0700, Yang Shi wrote: > > > On 3/18/20 5:55 PM, Yang Shi wrote: > > > > > > On 3/18/20 5:12 PM, Kirill A. Shutemov wrote: > > > On Thu, Mar 19, 2020 at 07:19:42AM +0800, Yang Shi wrote: > > > > When khugepaged collapses anonymous pages, the base pages would > > > > be freed > > > > via pagevec or free_page_and_swap_cache(). But, the anonymous page may > > > > be added back to LRU, then it might result in the below race: > > > > > > > > CPU A CPU B > > > > khugepaged: > > > > unlock page > > > > putback_lru_page > > > > add to lru > > > > page reclaim: > > > > isolate this page > > > > try_to_unmap > > > > page_remove_rmap <-- corrupt _mapcount > > > > > > > > It looks nothing would prevent the pages from isolating by reclaimer. > > > Hm. Why should it? > > > > > > try_to_unmap() doesn't exclude parallel page unmapping. _mapcount is > > > protected by ptl. And this particular _mapcount pin is reachable for > > > reclaim as it's not part of usual page table tree. Basically > > > try_to_unmap() will never succeeds until we give up the _mapcount on > > > khugepaged side. > > > > I don't quite get. What does "not part of usual page table tree" means? > > > > How's about try_to_unmap() acquires ptl before khugepaged? The page table we are dealing with was detached from the process' page table tree: see pmdp_collapse_flush(). try_to_unmap() will not see the pte. try_to_unmap() can only reach the ptl if split ptl is disabled (mm->page_table_lock is used), but it still will not be able to reach pte. > > > > > > I don't see the issue right away. > > > > > > > The other problem is the page's active or unevictable flag might be > > > > still set when freeing the page via free_page_and_swap_cache(). > > > So what? > > > > The flags may leak to page free path then kernel may complain if > > DEBUG_VM is set. Could you elaborate on what codepath you are talking about? > > > > The putback_lru_page() would not clear those two flags if the pages are > > > > released via pagevec, it sounds nothing prevents from isolating active > > Sorry, this is a typo. If the page is freed via pagevec, active and > unevictable flag would get cleared before freeing by page_off_lru(). > > But, if the page is freed by free_page_and_swap_cache(), these two flags are > not cleared. But, it seems this path is hit rare, the pages are freed by > pagevec for the most cases. > > > > > or unevictable pages. > > > Again, why should it? vmscan is equipped to deal with this. > > > > I don't mean vmscan, I mean khugepaged may isolate active and > > unevictable pages since it just simply walks page table. Why it is wrong? lru_cache_add() only complains if both flags set, it shouldn't happen. > > > > However I didn't really run into these problems, just in theory > > > > by visual > > > > inspection. > > > > > > > > And, it also seems unnecessary to have the pages add back to LRU > > > > again since > > > > they are about to be freed when reaching this point. So, > > > > clearing active > > > > and unevictable flags, unlocking and dropping refcount from isolate > > > > instead of calling putback_lru_page() as what page cache collapse does. > > > Hm? But we do call putback_lru_page() on the way out. I do not follow. > > > > It just calls putback_lru_page() at error path, not success path. > > Putting pages back to lru on error path definitely makes sense. Here it > > is the success path. I agree that putting the apage on LRU just before free the page is suboptimal, but I don't see it as a critical issue. -- Kirill A. Shutemov