On 22/02/2024 10:20, David Hildenbrand wrote: > On 22.02.24 11:19, David Hildenbrand wrote: >> On 25.10.23 16:45, Ryan Roberts wrote: >>> As preparation for supporting small-sized THP in the swap-out path, >>> without first needing to split to order-0, Remove the CLUSTER_FLAG_HUGE, >>> which, when present, always implies PMD-sized THP, which is the same as >>> the cluster size. >>> >>> The only use of the flag was to determine whether a swap entry refers to >>> a single page or a PMD-sized THP in swap_page_trans_huge_swapped(). >>> Instead of relying on the flag, we now pass in nr_pages, which >>> originates from the folio's number of pages. This allows the logic to >>> work for folios of any order. >>> >>> The one snag is that one of the swap_page_trans_huge_swapped() call >>> sites does not have the folio. But it was only being called there to >>> avoid bothering to call __try_to_reclaim_swap() in some cases. >>> __try_to_reclaim_swap() gets the folio and (via some other functions) >>> calls swap_page_trans_huge_swapped(). So I've removed the problematic >>> call site and believe the new logic should be equivalent. >> >> That is the __try_to_reclaim_swap() -> folio_free_swap() -> >> folio_swapped() -> swap_page_trans_huge_swapped() call chain I assume. >> >> The "difference" is that you will now (1) get another temporary >> reference on the folio and (2) (try)lock the folio every time you >> discard a single PTE of a (possibly) large THP. >> > > Thinking about it, your change will not only affect THP, but any call to > free_swap_and_cache(). > > Likely that's not what we want. :/ > Is folio_trylock() really that expensive given the code path is already locking multiple spinlocks, and I don't think we would expect the folio lock to be very contended? I guess filemap_get_folio() could be a bit more expensive, but again, is this really a deal-breaker? I'm just trying to refamiliarize myself with this series, but I think I ended up allocating a cluster per cpu per order. So one potential solution would be to turn the flag into a size and store it in the cluster info. (In fact I think I was doing that in an early version of this series - will have to look at why I got rid of that). Then we could avoid needing to figure out nr_pages from the folio.