On Fri, Mar 22, 2024 at 9:40 AM <chengming.zhou@xxxxxxxxx> wrote: > > From: Chengming Zhou <chengming.zhou@xxxxxxxxx> > > There is a report of data corruption caused by double swapin, which is > only possible in the skip swapcache path on SWP_SYNCHRONOUS_IO backends. > > The root cause is that zswap is not like other "normal" swap backends, > it won't keep the copy of data after the first time of swapin. So if > the folio in the first time of swapin can't be installed in the pagetable > successfully and we just free it directly. Then in the second time of > swapin, we can't find anything in zswap and read wrong data from swapfile, > so this data corruption problem happened. > > We can fix it by always adding the folio into swapcache if we know the > pinned swap entry can be found in zswap, so it won't get freed even though > it can't be installed successfully in the first time of swapin. A concurrent faulting thread could have already checked the swapcache before we add the folio to it, right? In this case, that thread will go ahead and call swap_read_folio() anyway. Also, I suspect the zswap lookup might hurt performance. Would it be better to add the folio back to zswap upon failure? This should be detectable by checking if the folio is dirty as I mentioned in the bug report thread.