On Wed, Feb 28, 2024 at 09:37:06AM +0000, Ryan Roberts wrote: > Fundamentally, we would like to be able to figure out the size of the swap slot > from the swap entry. Today swap supports 2 sizes; PAGE_SIZE and PMD_SIZE. For > PMD_SIZE, it always uses a full cluster, so can easily add a flag to the cluster > to mark it as PMD_SIZE. > > Going forwards, we want to support all sizes (power-of-2). Most of the time, a > cluster will contain only one size of THPs, but this is not the case when a THP > in the swapcache gets split or when an order-0 slot gets stolen. We expect these > cases to be rare. > > 1) Keep the size of the smallest swap entry in the cluster header. Most of the > time it will be the full size of the swap entry, but sometimes it will cover > only a portion. In the latter case you may see a false negative for > swap_page_trans_huge_swapped() meaning we take the slow path, but that is rare. > There is one wrinkle: currently the HUGE flag is cleared in put_swap_folio(). We > wouldn't want to do the equivalent in the new scheme (i.e. set the whole cluster > to order-0). I think that is safe, but haven't completely convinced myself yet. > > 2) allocate 4 bits per (small) swap slot to hold the order. This will give > precise information and is conceptually simpler to understand, but will cost > more memory (half as much as the initial swap_map[] again). > > I still prefer to avoid this at all if we can (and would like to hear Huang's > thoughts). But if its a choice between 1 and 2, I prefer 1 - I'll do some > prototyping. I can't quite bring myself to look up the encoding of swap entries but as long as we're willing to restrict ourselves to naturally aligning the clusters, there's an encoding (which I believe I invented) that lets us encode arbitrary power-of-two sizes with a single bit. I describe it here: https://kernelnewbies.org/MatthewWilcox/NaturallyAlignedOrder Let me know if it's not clear.