On Wed, Feb 05, 2025 at 02:38:39AM +0800, Kairui Song wrote: > On Wed, Feb 5, 2025 at 2:11 AM Yosry Ahmed <yosry.ahmed@xxxxxxxxx> wrote: > > However, what we should *not* do is have these clusters be tied to the > > disk swap space with the ability to redirect some entries to use > > someting like zswap. This does not fix the problem Johannes is > > describing. > > Yes, a virtual swap file can have its own swap space, which is indexed > by the cache / table, and reuse all the logic. As long as we don't > dramatically change the kernel swapout path, adding a folio to > swapcache seems a very reasonable way to avoid redundant IO, and > synchronize it upon swapin/swapout, and reusing a lot of > infrastructure, even if that's a virtual file. For example a current > busy loop issue can be just fixed by leveraging the folio lock: > https://lore.kernel.org/lkml/CAMgjq7D5qoFEK9Omvd5_Zqs6M+TEoG03+2i_mhuP5CQPSOPrmQ@xxxxxxxxxxxxxx/ > > The virtual file/space can be decoupled from the lower device. But the > virtual file/space's table entry can point to an underlying physical > SWAP device or some meta struct. It's a bit unclear to me still which level will use the struct swap_cluster_info in the layered scenario. Would it be the virtual address space, where ->table has tagged pointers to resolve to swapcache/zeromap/zswap/swapfile? Or would it be the swapfile space, where ->table resolves to disk slots? Or are you proposing to use the same struct on both levels, with ->table catering to different needs? Keep in mind, in the virtualized case, it's the top layer that would have to keep track of the page table count, the swapcache pointer and likely the memcg linkage. That also means the physical layer could likely be reduced to a single bit per entry - used or free. I suppose void *table could also point to such a bitmap? But not sure about the other members that would become redundant/unused.