On Tue, Mar 5, 2024 at 11:20 AM Chris Li <chrisl@xxxxxxxxxx> wrote: > > On Tue, Mar 5, 2024 at 2:55 AM Nhat Pham <nphamcs@xxxxxxxxx> wrote: > > > > On Tue, Mar 5, 2024 at 4:52 PM Chengming Zhou <chengming.zhou@xxxxxxxxx> wrote: > > > > > > Looks sensible. Now the zswap middle layer is transparent to frontend users, > > > which just allocate swap entry and swap out, don't care about whether it's > > > swapped out to the zswap or swap file. > > > > > > By decoupling, the frontend users need to know it want to allocate zswap entry > > > instead of a swap entry, right? Which becomes not transparent to users. > > > > Hmm for now, I was just thinking that it should always try zswap > > first, and only fall back to swap if it fails to store to zswap, to > > maintain the overall LRU ordering (best effort). > > > > The minimal viable implementation I'm thinking right now for this is > > basically the "ghost swapfile" approach - i.e represent zswap as a > > swapfile. > > Google has been using the ghost swapfile in production for many years. > If it helps, I can rebase the ghost swap file patches to mm-unstable > then send them out for RFC discussion. I am not expecting it to merge > as it is, just as a starting point for if any one is interested in the > ghost swap file. > > I think zswap with a ghost swap file will make zswap behave more like > other swap back ends. If you use the ghost swap file, migrating from > zswap to another swap device is very similar to migrating from SSD to > hard drive, for example. Yes please. > > Writeback becomes quite hairy though, because there might be two > > "swap" entries of the same object (the zswap swap entry and the newly > > reserved swap entry) lying around near the end of the writeback step, > > so gotta be careful with synchronization (read: juggling the swap > > cache) to make sure concurrent swap-ins get something that makes > > sense. > > Dealing with two swap device entries while writing back from one to > another is unavoidable. I consider it as necessary evil. > If we can have swap offset lookup to different swap entry types. One > idea is to introduce a migration type of swap entry, the swap entry > will have both source and destination swap entry stored in it. Then > you just read in the source swap entry data (compressed or not). Write > to the destination entry. Every swap in of the source swap entry will > notice it has a migration swap entry type. Then it will ask the > destination swap device to perform the IO. The same folio will exist > in both source and destination swap cache. > > The limit of this approach is that, unless the source entry usage > count drops to zero (every user swap in the entry). That source swap > entry is occupied. It can't be reused for other data. > > Chris >