On 2024/3/23 03:37, Yosry Ahmed wrote: > On Fri, Mar 22, 2024 at 9:40 AM <chengming.zhou@xxxxxxxxx> wrote: >> >> From: Chengming Zhou <chengming.zhou@xxxxxxxxx> >> >> There is a report of data corruption caused by double swapin, which is >> only possible in the skip swapcache path on SWP_SYNCHRONOUS_IO backends. >> >> The root cause is that zswap is not like other "normal" swap backends, >> it won't keep the copy of data after the first time of swapin. So if >> the folio in the first time of swapin can't be installed in the pagetable >> successfully and we just free it directly. Then in the second time of >> swapin, we can't find anything in zswap and read wrong data from swapfile, >> so this data corruption problem happened. >> >> We can fix it by always adding the folio into swapcache if we know the >> pinned swap entry can be found in zswap, so it won't get freed even though >> it can't be installed successfully in the first time of swapin. > > A concurrent faulting thread could have already checked the swapcache > before we add the folio to it, right? In this case, that thread will > go ahead and call swap_read_folio() anyway. Right, but it has to lock the folio to proceed. > > Also, I suspect the zswap lookup might hurt performance. Would it be > better to add the folio back to zswap upon failure? This should be > detectable by checking if the folio is dirty as I mentioned in the bug > report thread. Yes, may hurt performance. As for adding back upon failure, the problem is that adding may fail too... and I don't know how to handle that. Anyway, I think the fix of Johannes is much better, we should take that way. Thanks.