On Tue, Aug 6, 2024 at 2:01 PM zhiguojiang <justinjiang@xxxxxxxx> wrote: > > > > 在 2024/8/6 6:09, Barry Song 写道: > > On Tue, Aug 6, 2024 at 4:08 AM Zhiguo Jiang <justinjiang@xxxxxxxx> wrote: > >> Support mTHP's attempt to free swap entries as a whole, which can avoid > >> frequent swap_info locking for every individual entry in > >> swapcache_free_entries(). When the swap_map count values corresponding > >> to all contiguous entries are all zero excluding SWAP_HAS_CACHE, the > >> entries will be freed directly by skippping percpu swp_slots caches. > >> > > No, this isn't quite good. Please review the work done by Chris and Kairui[1]; > > they have handled it better. On a different note, I have a patch that can > > handle zap_pte_range() for swap entries in batches[2][3]. > I'm glad to see your optimized submission about batch freeing swap > entries for > zap_pte_range(), sorry, I didn't see it before. My this patch can be > ignored. no worries, please help test and review the formal patch I sent: https://lore.kernel.org/linux-mm/20240806012409.61962-1-21cnbao@xxxxxxxxx/ Please note that I didn't use a bitmap to avoid a large stack, and there is a real possibility of the below can occur, your patch can crash if the below is true: nr > SWAPFILE_CLUSTER - offset % SWAPFILE_CLUSTER Additionally, I quickly skip the case where swap_count(data_race(si->swap_map[start_offset]) != 1) to avoid regressions in cases that can't be batched. > > Thanks > Zhiguo > > > > > [1] https://lore.kernel.org/linux-mm/20240730-swap-allocator-v5-5-cb9c148b9297@xxxxxxxxxx/ > > [2] https://lore.kernel.org/linux-mm/20240803091118.84274-1-21cnbao@xxxxxxxxx/ > > [3] https://lore.kernel.org/linux-mm/CAGsJ_4wPnQqKOHx6iQcwO8bQzoBXKr2qY2AgSxMwTQCj3-8YWw@xxxxxxxxxxxxxx/ > > > >> Signed-off-by: Zhiguo Jiang <justinjiang@xxxxxxxx> > >> --- > >> mm/swapfile.c | 61 +++++++++++++++++++++++++++++++++++++++++++++++++++ > >> 1 file changed, 61 insertions(+) > >> > >> diff --git a/mm/swapfile.c b/mm/swapfile.c > >> index ea023fc25d08..829fb4cfb6ec > >> --- a/mm/swapfile.c > >> +++ b/mm/swapfile.c > >> @@ -1493,6 +1493,58 @@ static void swap_entry_range_free(struct swap_info_struct *p, swp_entry_t entry, > >> swap_range_free(p, offset, nr_pages); > >> } > >> > >> +/* > >> + * Free the contiguous swap entries as a whole, caller have to > >> + * ensure all entries belong to the same folio. > >> + */ > >> +static void swap_entry_range_check_and_free(struct swap_info_struct *p, > >> + swp_entry_t entry, int nr, bool *any_only_cache) > >> +{ > >> + const unsigned long start_offset = swp_offset(entry); > >> + const unsigned long end_offset = start_offset + nr; > >> + unsigned long offset; > >> + DECLARE_BITMAP(to_free, SWAPFILE_CLUSTER) = { 0 }; > >> + struct swap_cluster_info *ci; > >> + int i = 0, nr_setbits = 0; > >> + unsigned char count; > >> + > >> + /* > >> + * Free and check swap_map count values corresponding to all contiguous > >> + * entries in the whole folio range. > >> + */ > >> + WARN_ON_ONCE(nr > SWAPFILE_CLUSTER); > >> + ci = lock_cluster_or_swap_info(p, start_offset); > >> + for (offset = start_offset; offset < end_offset; offset++, i++) { > >> + if (data_race(p->swap_map[offset])) { > >> + count = __swap_entry_free_locked(p, offset, 1); > >> + if (!count) { > >> + bitmap_set(to_free, i, 1); > >> + nr_setbits++; > >> + } else if (count == SWAP_HAS_CACHE) { > >> + *any_only_cache = true; > >> + } > >> + } else { > >> + WARN_ON_ONCE(1); > >> + } > >> + } > >> + unlock_cluster_or_swap_info(p, ci); > >> + > >> + /* > >> + * If the swap_map count values corresponding to all contiguous entries are > >> + * all zero excluding SWAP_HAS_CACHE, the entries will be freed directly by > >> + * skippping percpu swp_slots caches, which can avoid frequent swap_info > >> + * locking for every individual entry. > >> + */ > >> + if (nr > 1 && nr_setbits == nr) { > >> + spin_lock(&p->lock); > >> + swap_entry_range_free(p, entry, nr); > >> + spin_unlock(&p->lock); > >> + } else { > >> + for_each_set_bit(i, to_free, SWAPFILE_CLUSTER) > >> + free_swap_slot(swp_entry(p->type, start_offset + i)); > >> + } > >> +} > >> + > >> static void cluster_swap_free_nr(struct swap_info_struct *sis, > >> unsigned long offset, int nr_pages, > >> unsigned char usage) > >> @@ -1808,6 +1860,14 @@ void free_swap_and_cache_nr(swp_entry_t entry, int nr) > >> if (WARN_ON(end_offset > si->max)) > >> goto out; > >> > >> + /* > >> + * Try to free all contiguous entries about mTHP as a whole. > >> + */ > >> + if (IS_ENABLED(CONFIG_THP_SWAP) && nr > 1) { > >> + swap_entry_range_check_and_free(si, entry, nr, &any_only_cache); > >> + goto free_cache; > >> + } > >> + > >> /* > >> * First free all entries in the range. > >> */ > >> @@ -1821,6 +1881,7 @@ void free_swap_and_cache_nr(swp_entry_t entry, int nr) > >> } > >> } > >> > >> +free_cache: > >> /* > >> * Short-circuit the below loop if none of the entries had their > >> * reference drop to zero. > >> -- > >> 2.39.0 > >> Thanks Barry