Re: [LSF/MM/BPF TOPIC] Swap Abstraction / Native Zswap

Yosry Ahmed <yosryahmed@xxxxxxxxxx> · Thu, 2 Mar 2023 14:55:09 -0800

On Thu, Mar 2, 2023 at 2:36 PM Rik van Riel <riel@xxxxxxxxxxx> wrote:
>
> On Thu, 2023-03-02 at 13:42 -0800, Chris Li wrote:
> > On Thu, Mar 02, 2023 at 01:23:14PM -0500, Rik van Riel wrote:
> > >
> > >
> > > One possible implementation might be to have swap page table
> > > entries
> > > point to a swap address in this indirection layer, and the
> > > indirection
> > > layer can be an xarray containing the actual swap entries
> > > specifying
> > > at which position in which swap device the data can be found.
> >
> > The questions, do we have this indirection layer apply to all swap
> > entries?
> >
> I believe we should have a system that tracks every swap entry
> the same, data structure wise. Otherwise we will have two sets
> of code in the kernel, and it will be too easy to get corner
> cases wrong.
>
> > My small tweak is to limit the indirection layer only to non leaf
> > swap devices. Then it is actually very close to what I am proposing.
> > Just your "indirection layer" is my "special swap device".
> >
> > Again, "special swap device" is a very bad name, let's name it
> > something
> > more useful.
> >
> > > That might be a net reduction in the code over what we have today,
> > > because it gets rid of some ugly corner cases.
> >
> > Great.
>
> ... but that won't happen if the indirection layer only applies
> to some swap devices, because we will still need to keep around
> the crazy code to deal with the swap devices that don't have it.

I agree with Rik here. We can certainly special case the indirection
layer and only apply to some swap backends (e.g. zswap), but this
makes things more complicated. For example, if each swap backend
maintains swap_count in their own way, we have to hand over the swap
count when we move a swapped page between backends.

With a common data structure like the proposed swap_desc, everything
becomes easier to reason about. The core swapping logic that is
agnostic to the backend like swapcache and swap counting lives in one
common place and becomes easier to reason about. Swap backends like
swapfiles or zswap can then implement a common interface to do backend
specific operations, like allocating entries, reading/writing pages,
etc.

This, of course, isn't free. There is an associated overhead. It's a
trade off like most things are. We want to work towards the outcome of
that tradeoff that makes sense, we don't want to incur too much
overhead, but we also don't want a very complicated and error-prone
implementation.

Rik, I am wondering about your thoughts on this proposal and how you
think it can be improved?

>
> --
> All Rights Reversed.