Re: [LSF/MM/BPF TOPIC] Swap Abstraction "the pony"

Jared Hulbert <jaredeh@xxxxxxxxx> · Thu, 7 Mar 2024 00:57:17 -0800

On Wed, Mar 6, 2024 at 4:46 PM Chris Li <chrisl@xxxxxxxxxx> wrote:
>
> OK, you are suggesting not using file inodes for 4K swap pages.
> Also not design our own data structure to manage swap entry allocation.
>
> Then how do you allocate swap entries using this file system or database?
> More detail on how swap entries map into the large files offsets can
> help me understand what you are trying to do.
>
> Swap file support exists in the kernel. You can block IO on the swap
> device with a given offset. The block device API exists.  That is how
> the swap back end works right now. I am not sure I understand your
> question.

To apply the database model to the problems of fragmentation and mTHP
you could have a file for every page size.  All your offsets would be
aligned.  Similar to what Chuanhua Han is proposing in the swap device
on another subthread.

Here is an example of how filesystems would make this all so easy.
Let's assume you have a 20GB filesystem so you set it up with a 10GB
file for 4KB pages and 10GB for mTHP.  Then overtime workloads change
and the 4KB is only using 2GB while the mTHP needs more space so you
decide to add 5GB to the mTHP taking it from the 4KB.  However, while
the 4KB is largely unutilized there is a valid entry at the last
offset, you can't truncate the file without moving entries.  If you
fallocate(FALLOC_FL_PUNCH_HOLE) when you free entries then you end up
with a sparse file and can easily grow the mTHP file to 15GB.  You end
up with 25GB of logical space on the 20GB disk no problem.