Chris Li <chrisl@xxxxxxxxxx> writes: > On Tue, Jan 30, 2024 at 7:58 PM Kairui Song <ryncsn@xxxxxxxxx> wrote: >> >> Hi Ying, >> >> On Wed, Jan 31, 2024 at 10:53 AM Huang, Ying <ying.huang@xxxxxxxxx> wrote: >> > >> > Hi, Minchan, >> > >> > When I review the patchset from Kairui, I checked the code to skip swap >> > cache in do_swap_page() for swap device with SWP_SYNCHRONOUS_IO. Is the >> > following race possible? Where a page is swapped out to a swap device >> > with SWP_SYNCHRONOUS_IO and the swap count is 1. Then 2 threads of the >> > process runs on CPU0 and CPU1 as below. CPU0 is running do_swap_page(). >> >> Chris raised a similar issue about the shmem path, and I was worrying >> about the same issue in previous discussions about do_swap_page: >> https://lore.kernel.org/linux-mm/CAMgjq7AwFiDb7cAMkWMWb3vkccie1-tocmZfT7m4WRb_UKPghg@xxxxxxxxxxxxxx/ > > Ha thanks for remembering that. > >> >> """ >> In do_swap_page path, multiple process could swapin the page at the >> same time (a mapped once page can still be shared by sub threads), >> they could get different folios. The later pte lock and pte_same check >> is not enough, because while one process is not holding the pte lock, >> another process could read-in, swap_free the entry, then swap-out the >> page again, using same entry, an ABA problem. The race is not likely >> to happen in reality but in theory possible. >> """ >> >> > >> > CPU0 CPU1 >> > ---- ---- >> > swap_cache_get_folio() >> > check sync io and swap count >> > alloc folio >> > swap_readpage() >> > folio_lock_or_retry() >> > swap in the swap entry >> > write page >> > swap out to same swap entry >> > pte_offset_map_lock() >> > check pte_same() >> > swap_free() <-- new content lost! >> > set_pte_at() <-- stale page! >> > folio_unlock() >> > pte_unmap_unlock() >> >> Thank you very much for highlighting this! >> >> My concern previously is the same as yours (swapping out using the >> same entry is like an ABA issue, where pte_same failed to detect the >> page table change), later when working on V3, I mistakenly thought >> that's impossible as entry should be pinned until swap_free on CPU0, >> and I'm wrong. CPU1 can also just call swap_free, then swap count is >> dropped to 0 and it can just swap out using the same entry. Now I >> think my patch 6/7 is also affected by this potential race. Seems >> nothing can stop it from doing this. >> >> Actually I was trying to make a reproducer locally, due to swap slot >> cache, swap allocation algorithm, and the short race window, this is >> very unlikely to happen though. > > You can put some sleep in some of the CPU0 where expect the other race > to happen to manual help triggering it. Yes, it sounds hard to trigger > in real life due to reclaim swap out. > >> >> How about we just increase the swap count temporarily in the direct >> swap in path (after alloc folio), then drop the count after pte_same >> (or shmem_add_to_page_cache in shmem path)? That seems enough to >> prevent the entry reuse issue. > > Sounds like a good solution. Yes. It seems that this can solve the race. -- Best Regards, Huang, Ying