On Wed, Mar 6, 2024 at 4:46 PM Chris Li <chrisl@xxxxxxxxxx> wrote: > > OK, you are suggesting not using file inodes for 4K swap pages. > Also not design our own data structure to manage swap entry allocation. > > Then how do you allocate swap entries using this file system or database? > More detail on how swap entries map into the large files offsets can > help me understand what you are trying to do. > > Swap file support exists in the kernel. You can block IO on the swap > device with a given offset. The block device API exists. That is how > the swap back end works right now. I am not sure I understand your > question. To apply the database model to the problems of fragmentation and mTHP you could have a file for every page size. All your offsets would be aligned. Similar to what Chuanhua Han is proposing in the swap device on another subthread. Here is an example of how filesystems would make this all so easy. Let's assume you have a 20GB filesystem so you set it up with a 10GB file for 4KB pages and 10GB for mTHP. Then overtime workloads change and the 4KB is only using 2GB while the mTHP needs more space so you decide to add 5GB to the mTHP taking it from the 4KB. However, while the 4KB is largely unutilized there is a valid entry at the last offset, you can't truncate the file without moving entries. If you fallocate(FALLOC_FL_PUNCH_HOLE) when you free entries then you end up with a sparse file and can easily grow the mTHP file to 15GB. You end up with 25GB of logical space on the 20GB disk no problem.