Re: [LSF/MM/BPF TOPIC] Swap Abstraction "the pony"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Mar 1, 2024 at 4:24 PM Chris Li <chrisl@xxxxxxxxxx> wrote:
>
> In last year's LSF/MM I talked about a VFS-like swap system. That is
> the pony that was chosen.
> However, I did not have much chance to go into details.

I'd love to attend this talk/chat :)

>
> This year, I would like to discuss what it takes to re-architect the
> whole swap back end from scratch?
>
> Let’s start from the requirements for the swap back end.
>
> 1) support the existing swap usage (not the implementation).
>
> Some other design goals::
>
> 2) low per swap entry memory usage.
>
> 3) low io latency.
>
> What are the functions the swap system needs to support?
>
> At the device level. Swap systems need to support a list of swap files
> with a priority order. The same priority of swap device will do round
> robin writing on the swap device. The swap device type includes zswap,
> zram, SSD, spinning hard disk, swap file in a file system.
>
> At the swap entry level, here is the list of existing swap entry usage:
>
> * Swap entry allocation and free. Each swap entry needs to be
> associated with a location of the disk space in the swapfile. (offset
> of swap entry).
> * Each swap entry needs to track the map count of the entry. (swap_map)
> * Each swap entry needs to be able to find the associated memory
> cgroup. (swap_cgroup_ctrl->map)
> * Swap cache. Lookup folio/shadow from swap entry
> * Swap page writes through a swapfile in a file system other than a
> block device. (swap_extent)
> * Shadow entry. (store in swap cache)

IMHO, one thing this new abstraction should support is seamless
transfer/migration of pages from one backend to another (perhaps from
high to low priority backends, i.e writeback).

I think this will require some careful redesigns. The closest thing we
have right now is zswap -> backing swapfile. But it is currently
handled in a rather peculiar manner - the underlying swap slot has
already been reserved for the zswap entry. But there's a couple of
problems with this:

a) This is wasteful. We're essentially having the same piece of data
occupying spaces in two levels in the hierarchies.
b) How do we generalize to a multi-tier hierarchy?
c) This is a bit too backend-specific. It'd be nice if we can make
this as backend-agnostic as possible (if possible).

Motivation: I'm currently working/thinking about decoupling zswap and
swap, and this is one of the more challenging aspects (as I can't seem
to find a precedent in the swap world for inter-swap backends pages
migration), and especially with respect to concurrent loads (and
swapcache interactions).

I don't have good answers/designs quite yet - just raising some
questions/concerns :)





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux