Re: [LSF/MM/BPF TOPIC] Swap Abstraction / Native Zswap

Chris Li <chrisl@xxxxxxxxxx> · Tue, 28 Feb 2023 15:11:52 -0800

Hi Yosry,

On Sat, Feb 18, 2023 at 02:38:40PM -0800, Yosry Ahmed wrote:
> Hello everyone,
> 
> I would like to propose a topic for the upcoming LSF/MM/BPF in May
> 2023 about swap & zswap (hope I am not too late).

I am very interested in participating in this discussion as well.

> ==================== Objective ====================
> Enabling the use of zswap without a backing swapfile, which makes
> zswap useful for a wider variety of use cases. Also, when zswap is
> used with a swapfile, the pages in zswap do not use up space in the
> swapfile, so the overall swapping capacity increases.

Agree.

> 
> ==================== Idea ====================
> Introduce a data structure, which I currently call a swap_desc, as an
> abstraction layer between swapping implementation and the rest of MM
> code. Page tables & page caches would store a swap id (encoded as a
> swp_entry_t) instead of directly storing the swap entry associated
> with the swapfile. This swap id maps to a struct swap_desc, which acts

Can you provide a bit more detail? I am curious how this swap id
maps into the swap_desc? Is the swp_entry_t cast into "struct
swap_desc*" or going through some lookup table/tree?

> as our abstraction layer. All MM code not concerned with swapping
> details would operate in terms of swap descs. The swap_desc can point
> to either a normal swap entry (associated with a swapfile) or a zswap
> entry. It can also include all non-backend specific operations, such
> as the swapcache (which would be a simple pointer in swap_desc), swap

Does the zswap entry still use the swap slot cache and swap_info_struct?

> This work enables using zswap without a backing swapfile and increases
> the swap capacity when zswap is used with a swapfile. It also creates
> a separation that allows us to skip code paths that don't make sense
> in the zswap path (e.g. readahead). We get to drop zswap's rbtree
> which might result in better performance (less lookups, less lock
> contention).
> 
> The abstraction layer also opens the door for multiple cleanups (e.g.
> removing swapper address spaces, removing swap count continuation
> code, etc). Another nice cleanup that this work enables would be
> separating the overloaded swp_entry_t into two distinct types: one for
> things that are stored in page tables / caches, and for actual swap
> entries. In the future, we can potentially further optimize how we use
> the bits in the page tables instead of sticking everything into the
> current type/offset format.

Looking forward to seeing more details in the upcoming discussion.
> 
> ==================== Cost ====================
> The obvious downside of this is added memory overhead, specifically
> for users that use swapfiles without zswap. Instead of paying one byte
> (swap_map) for every potential page in the swapfile (+ swap count
> continuation), we pay the size of the swap_desc for every page that is
> actually in the swapfile, which I am estimating can be roughly around
> 24 bytes or so, so maybe 0.6% of swapped out memory. The overhead only
> scales with pages actually swapped out. For zswap users, it should be

Is there a way to avoid turning 1 byte into 24 byte per swapped
pages? For the users that use swap but no zswap, this is pure overhead.

It seems what you really need is one bit of information to indicate
this page is backed by zswap. Then you can have a seperate pointer
for the zswap entry.

Depending on how much you are going to reuse the swap cache, you might
need to have something like a swap_info_struct to keep the locks happy.

> Another potential concern is readahead. With this design, we have no

Readahead is for spinning disk :-) Even a normal swap file with an SSD can
use some modernization.

Looking forward to your discussion.

Chris