On Wed, Aug 7, 2024 at 4:25 PM Barry Song <21cnbao@xxxxxxxxx> wrote: > > From: Barry Song <v-songbaohua@xxxxxxxx> > > Zhiguo reported that swap release could be a serious bottleneck > during process exits[1]. With mTHP, we have the opportunity to > batch free swaps. > Thanks to the work of Chris and Kairui[2], I was able to achieve > this optimization with minimal code changes by building on their > efforts. > If swap_count is 1, which is likely true as most anon memory are > private, we can free all contiguous swap slots all together. > > Ran the below test program for measuring the bandwidth of munmap > using zRAM and 64KiB mTHP: > > #include <sys/mman.h> > #include <sys/time.h> > #include <stdlib.h> > > unsigned long long tv_to_ms(struct timeval tv) > { > return tv.tv_sec * 1000 + tv.tv_usec / 1000; > } > > main() > { > struct timeval tv_b, tv_e; > int i; > #define SIZE 1024*1024*1024 > void *p = mmap(NULL, SIZE, PROT_READ | PROT_WRITE, > MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); > if (!p) { > perror("fail to get memory"); > exit(-1); > } > > madvise(p, SIZE, MADV_HUGEPAGE); > memset(p, 0x11, SIZE); /* write to get mem */ > > madvise(p, SIZE, MADV_PAGEOUT); > > gettimeofday(&tv_b, NULL); > munmap(p, SIZE); > gettimeofday(&tv_e, NULL); > > printf("munmap in bandwidth: %ld bytes/ms\n", > SIZE/(tv_to_ms(tv_e) - tv_to_ms(tv_b))); > } > > The result is as below (munmap bandwidth): > mm-unstable mm-unstable-with-patch > round1 21053761 63161283 > round2 21053761 63161283 > round3 21053761 63161283 > round4 20648881 67108864 > round5 20648881 67108864 > > munmap bandwidth becomes 3X faster. Hi Barry, Thanks for the patch, I also noticed this could be optimized when working on the batch freeing of mthp pages in the series you mentioned, a very nice improvement. > > [1] https://lore.kernel.org/linux-mm/20240731133318.527-1-justinjiang@xxxxxxxx/ > [2] https://lore.kernel.org/linux-mm/20240730-swap-allocator-v5-0-cb9c148b9297@xxxxxxxxxx/ > > Cc: Kairui Song <kasong@xxxxxxxxxxx> > Cc: Chris Li <chrisl@xxxxxxxxxx> > Cc: "Huang, Ying" <ying.huang@xxxxxxxxx> > Cc: Hugh Dickins <hughd@xxxxxxxxxx> > Cc: Kalesh Singh <kaleshsingh@xxxxxxxxxx> > Cc: Ryan Roberts <ryan.roberts@xxxxxxx> > Cc: David Hildenbrand <david@xxxxxxxxxx> > Signed-off-by: Barry Song <v-songbaohua@xxxxxxxx> > --- > mm/swapfile.c | 78 +++++++++++++++++++++++++++++++++++++++++++-------- > 1 file changed, 67 insertions(+), 11 deletions(-) > > diff --git a/mm/swapfile.c b/mm/swapfile.c > index 35cb58373493..25c3f98fa8d5 100644 > --- a/mm/swapfile.c > +++ b/mm/swapfile.c > @@ -156,6 +156,25 @@ static bool swap_is_has_cache(struct swap_info_struct *si, > return true; > } > > +static bool swap_is_last_map(struct swap_info_struct *si, > + unsigned long offset, int nr_pages, > + bool *has_cache) > +{ > + unsigned char *map = si->swap_map + offset; > + unsigned char *map_end = map + nr_pages; > + bool cached = false; > + > + do { > + if ((*map & ~SWAP_HAS_CACHE) != 1) > + return false; I haven't tried this yet, but looking at this if. If a mthp or thp was split, and part of the slots are "1", rest of the slots are "HAS_CACHE | 1", this will also return true, is this a problem? These slots with "1" don't have an entry in the swap cache, so the following reclaim in free_swap_and_cache_nr might not work as expected, could they be stuck in HAS_CACHE state?