On Wed, Sep 4, 2024 at 12:17 AM Barry Song <21cnbao@xxxxxxxxx> wrote: > > On Wed, Sep 4, 2024 at 7:12 PM Yosry Ahmed <yosryahmed@xxxxxxxxxx> wrote: > > > > [..] > > > > @@ -426,6 +515,26 @@ static void sio_read_complete(struct kiocb *iocb, long ret) > > > > mempool_free(sio, sio_pool); > > > > } > > > > > > > > +static bool swap_read_folio_zeromap(struct folio *folio) > > > > +{ > > > > + unsigned int idx = swap_zeromap_folio_test(folio); > > > > + > > > > + if (idx == 0) > > > > + return false; > > > > + > > > > + /* > > > > + * Swapping in a large folio that is partially in the zeromap is not > > > > + * currently handled. Return true without marking the folio uptodate so > > > > + * that an IO error is emitted (e.g. do_swap_page() will sigbus). > > > > + */ > > > > + if (WARN_ON_ONCE(idx < folio_nr_pages(folio))) > > > > + return true; > > > > > > Hi Usama, Yosry, > > > > > > I feel the warning is wrong as we could have the case where idx==0 > > > is not zeromap but idx=1 is zeromap. idx == 0 doesn't necessarily > > > mean we should return false. > > > > Good catch. Yeah if idx == 0 is not in the zeromap but other indices > > are we will mistakenly read the entire folio from swap. > > > > > > > > What about the below change which both fixes the warning and unblocks > > > large folios swap-in? > > > > But I don't see how that unblocks the large folios swap-in work? We > > still need to actually handle the case where a large folio being > > swapped in is partially in the zeromap. Right now we warn and unlock > > the folio without calling folio_mark_uptodate(), which emits an IO > > error. > > I placed this in mm/swap.h so that during swap-in, it can filter out any large > folios where swap_zeromap_entries_count() is greater than 0 and less than > nr. > > I believe this case would be quite rare, as it can only occur when some small > folios that are swapped out happen to have contiguous and aligned swap > slots. I am assuming this would be near where the zswap_never_enabled() check is today, right? I understand the point of doing this to unblock the synchronous large folio swapin support work, but at some point we're gonna have to actually handle the cases where a large folio being swapped in is partially in the swap cache, zswap, the zeromap, etc. All these cases will need similar-ish handling, and I suspect we won't just skip swapping in large folios in all these cases.