On Wed, Mar 01, 2023 at 12:08:17AM +0000, Matthew Wilcox wrote: > > Can you expand a bit on that? I assume you want to see the swap file > > behavior more like a normal file system and reuse more of the readpage() > > and writepage() path. > > Actually, no, readpage() and writepage() should be reserved for > page cache. We now have a ->swap_rw(), but it's only implemented by > nfs so far. Instead of constructing its own BIOs, swap should invoke > ->swap_rw for every filesystem. I suspect we can do a fairly generic > block_swap_rw() for the vast majority of filesystems. The swap_rw() is for the file system backing the swap file. That is more close to the back end IO side. In the case of zswap, it it can't be implemented as a simple file system layer because the vma can only belong to one file system. Zswap can back some of the page in a vmx but not the other. It will require some support before hitting the swap_rw() paging path. BTW, current code the swap_rw() is called from swap_writepage() which is part of the writepage() call as well. > > When the page fault happens, does the whole folios get swapped in or break > > into smaller pages? > > I think the whole folio should be swapped in. See my proposal for > determining the correct size folio to use here: > https://lore.kernel.org/linux-mm/Y%2FU8bQd15aUO97vS@xxxxxxxxxxxxxxxxxxxx/ > > Assuming something like that gets implemented, for a large folio to > be swapped out, we've had a selection of page faults on the folio, > followed by a period of no faults. All of a sudden we have a fault, > so I think we should bring the whole folio back in. The algorithm I > outline in that email would then take care of breaking down the folio > into smaller folios if it turns out they're not used. One side effect is that the fault might bring in more pages than it absolutely necessary. Might want to collect some data on that to see the real impact. Chris