On Tue, Mar 5, 2024 at 2:55 AM Nhat Pham <nphamcs@xxxxxxxxx> wrote: > > On Tue, Mar 5, 2024 at 4:52 PM Chengming Zhou <chengming.zhou@xxxxxxxxx> wrote: > > > > Looks sensible. Now the zswap middle layer is transparent to frontend users, > > which just allocate swap entry and swap out, don't care about whether it's > > swapped out to the zswap or swap file. > > > > By decoupling, the frontend users need to know it want to allocate zswap entry > > instead of a swap entry, right? Which becomes not transparent to users. > > Hmm for now, I was just thinking that it should always try zswap > first, and only fall back to swap if it fails to store to zswap, to > maintain the overall LRU ordering (best effort). > > The minimal viable implementation I'm thinking right now for this is > basically the "ghost swapfile" approach - i.e represent zswap as a > swapfile. Google has been using the ghost swapfile in production for many years. If it helps, I can rebase the ghost swap file patches to mm-unstable then send them out for RFC discussion. I am not expecting it to merge as it is, just as a starting point for if any one is interested in the ghost swap file. I think zswap with a ghost swap file will make zswap behave more like other swap back ends. If you use the ghost swap file, migrating from zswap to another swap device is very similar to migrating from SSD to hard drive, for example. > Writeback becomes quite hairy though, because there might be two > "swap" entries of the same object (the zswap swap entry and the newly > reserved swap entry) lying around near the end of the writeback step, > so gotta be careful with synchronization (read: juggling the swap > cache) to make sure concurrent swap-ins get something that makes > sense. Dealing with two swap device entries while writing back from one to another is unavoidable. I consider it as necessary evil. If we can have swap offset lookup to different swap entry types. One idea is to introduce a migration type of swap entry, the swap entry will have both source and destination swap entry stored in it. Then you just read in the source swap entry data (compressed or not). Write to the destination entry. Every swap in of the source swap entry will notice it has a migration swap entry type. Then it will ask the destination swap device to perform the IO. The same folio will exist in both source and destination swap cache. The limit of this approach is that, unless the source entry usage count drops to zero (every user swap in the entry). That source swap entry is occupied. It can't be reused for other data. Chris