Hi Andreas, On Thu, Feb 22, 2024 at 7:03 PM Andreas Dilger <adilger@xxxxxxxxx> wrote: > > On Feb 22, 2024, at 3:45 PM, Chris Li <chrisl@xxxxxxxxxx> wrote: > > > > Hi David, > > > > On Fri, Feb 2, 2024 at 1:10 AM David Howells <dhowells@xxxxxxxxxx> wrote: > >> > >> Hi, > >> > >> The topic came up in a recent discussion about how to deal with large folios > >> when it comes to swap as a swap device is normally considered a simple array > >> of PAGE_SIZE-sized elements that can be indexed by a single integer. > > > > Sorry for being late for the party. I think I was the one that brought > > this topic up in the online discussion with Will and You. Let me know > > if you are referring to a different discussion. > > > >> > >> With the advent of large folios, however, we might need to change this in > >> order to be better able to swap out a compound page efficiently. Swap > >> fragmentation raises its head, as does the need to potentially save multiple > >> indices per folio. Does swap need to grow more filesystem features? > > > > Yes, with a large folio, it is harder to allocate continuous swap > > entries where 4K swap entries are allocated and free all the time. The > > fragmentation will likely make the swap file have very little > > continuous swap entries. > > One option would be to reuse the multi-block allocator (mballoc) from > ext4, which has quite efficient power-of-two buddy allocation. That > would naturally aggregate contiguous pages as they are freed. Since > the swap partition is not containing anything useful across a remount > there is no need to save allocation bitmaps persistently. That is a very interesting idea. I saw two ways to solve this problem, buddy allocation system is one of them. The buddy allocation system can keep the assumption that swap entries will be contiguous within the same folio. The buddy system also has its own limits due to external fragmentations. For one there is no easy way to relocate the swap entry to other locations. We don't have the rmap for swap entries. That makes the swap entries hard to compact. I do expect the buddy allocator can help reduce the fragmentation greatly. The other way is just to have an indirection for mapping a folio's swap entry to discontiguous swap entries. It will break more assumptions of the current code about contiguous swap entries. If we can reuse the ext4 mballoc for swap entries, that would be great. I will take a look at that and report back. Thanks for the great suggestion. Chris