On Fri, Mar 29, 2019 at 05:25:34PM -0400, Qian Cai wrote: > On Fri, 2019-03-29 at 12:59 -0700, Matthew Wilcox wrote: > > Oh ... is it a race? > > > > so CPU A does: > > > > page = find_get_page(swap_address_space(entry), offset) > > page = find_subpage(page, offset); > > trylock_page(page); > > > > while CPU B does: > > > > xa_lock_irq(&address_space->i_pages); > > __delete_from_swap_cache(page, entry); > > xas_store(&xas, NULL); > > ClearPageSwapCache(page); > > xa_unlock_irq(&address_space->i_pages); > > > > and if the ClearPageSwapCache happens between the xas_load() and the > > find_subpage(), we're stuffed. CPU A has a reference to the page, but > > not a lock, and find_get_page is running under RCU. > > > > I suppose we could fix this by taking the i_pages xa_lock around the > > call to find_get_pages(). If indeed, that's what this problem is. > > Want to try this patch? > > Confirmed. Well spotted! Excellent! I'm not comfortable with the rule that you have to be holding the i_pages lock in order to call find_get_page() on a swap address_space. How does this look to the various smart people who know far more about the MM than I do? The idea is to ensure that if this race does happen, the page will be handled the same way as a pagecache page. If __delete_from_swap_cache() can be called while the page is still part of a VMA, then this patch will break page_to_pgoff(). But I don't think that can happen ... ? (also, is this the right memory barrier to use to ensure that the old value of page->index cannot be observed if the PageSwapCache bit is obseved as having been cleared?) diff --git a/mm/swap_state.c b/mm/swap_state.c index 2e15cc335966..a715efcf0991 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -165,13 +165,16 @@ void __delete_from_swap_cache(struct page *page, swp_entry_t entry) VM_BUG_ON_PAGE(!PageSwapCache(page), page); VM_BUG_ON_PAGE(PageWriteback(page), page); + page->index = idx; + smp_mb__before_atomic(); + ClearPageSwapCache(page); + for (i = 0; i < nr; i++) { void *entry = xas_store(&xas, NULL); VM_BUG_ON_PAGE(entry != page, entry); set_page_private(page + i, 0); xas_next(&xas); } - ClearPageSwapCache(page); address_space->nrpages -= nr; __mod_node_page_state(page_pgdat(page), NR_FILE_PAGES, -nr); ADD_CACHE_INFO(del_total, nr);