On Tue, Jun 11, 2024 at 4:49 AM Usama Arif <usamaarif642@xxxxxxxxx> wrote: > > @@ -515,8 +600,11 @@ void swap_read_folio(struct folio *folio, bool > synchronous, > >>>> psi_memstall_enter(&pflags); > >>>> } > >>>> delayacct_swapin_start(); > >>>> - > >>>> - if (zswap_load(folio)) { > >>>> + if (swap_zeromap_folio_test(folio)) { > >>>> + folio_zero_fill(folio); > >>>> + folio_mark_uptodate(folio); > >>>> + folio_unlock(folio); > >>> We don't currently support swapping in large folios, but it is a work > >>> in progress, and this will break once we have it. > >>> swap_zeromap_folio_test() will return false even if parts of the folio > >>> are in fact zero-filled. Then, we will go read those from disk swap, > >>> essentially corrupting data. > >> So yes, with this patch I tested swap out of large zero folio, but when > >> swapping in it was page by page. My assumption was that swap_read_folio > >> (when support is added) would only pass a large folio that was earlier > >> swapped out as a large folio. So if a zero filled large folio was > >> swapped out, the zeromap will be set for all the pages in that folio, > >> and at large folio swap in (when it is supported), it will see that all > >> the bits in the zeromap for that folio are set, and will just > >> folio_zero_fill. > >> > >> If even a single page in large folio has non-zero data then zeromap will > >> not store it and it will go to either zswap or disk, and at read time as > >> all the bits in zeromap are not set, it will still goto either zswap or > >> disk, so I think this works? > >> > >> Is my assumption wrong that only large folios can be swapped in only if > >> they were swapped out as large? I think this code works in that case. > > I think the assumption is incorrect. I think we would just check if > > contiguous PTEs have contiguous swap entries and swapin the folio as a > > large folio in this case. It is likely that the swap entries are > > contiguous because it was swapped out as a large folio, but I don't > > think it's guaranteed. > > Yes, makes sense. Thanks for explaining this. > > > > > For example, here is a patch that implements large swapin support for > > the synchronous swapin case, and I think it just checks that the PTEs > > have contiguous swap entries: > > https://lore.kernel.org/linux-mm/20240304081348.197341-6-21cnbao@xxxxxxxxx/ > > > > This makes a lot of sense because otherwise you'd have to keep track > > of how the folios were composed at the time of swapout, to be able to > > swap the same folios back in. > > I think the solution to large folio swap-in for this optimization and > zswap is not in swap_read_folio in this patch-series or any call further > down the stack, as its too late by the time you reach here, but in > Barrys' patch. This needs to happen much earlier when deciding the size > of the folio at folio creation in alloc_swap_folio, after Barry checks > > if (is_zswap_enabled()) > goto fallback; > > once the order is decided, we would need to check the indexes in the > zeromap array starting from swap_offset(entry) and ending at > swap_offset(entry) + 2^order are set. If no bits are set, or all bits > are set, then we allocate large folio. Otherwise, we goto fallback. > > I think its better to handle this in Barrys patch. I feel this series is > close to its final state, i.e. the only diff I have for the next > revision is below to remove start/end_writeback for zer_filled case. I > will comment on Barrys patch once the I send out the next revision of this. Sorry I did not make myself clearer. I did not mean that you should handle the large folio swapin here. This needs to be handled at a higher level because as you mentioned, a large folio may be partially in the zeromap, zswap, swapcache, disk, etc. What I meant is that we should probably have a debug check to make sure this doesn't go unhandled. For zswap, I am trying to add a warning and fail the swapin operation if a large folio slips through to zswap. We can do something similar here if folks agree this is the right way in the interim: https://lore.kernel.org/lkml/20240611024516.1375191-3-yosryahmed@xxxxxxxxxx/. Maybe I am too paranoid, but I think it's easy to mess up these things when working on large folio swapin imo.