On Thu, Aug 8, 2024 at 4:16 AM Kairui Song <ryncsn@xxxxxxxxx> wrote: > > On Wed, Aug 7, 2024 at 4:25 PM Barry Song <21cnbao@xxxxxxxxx> wrote: > > > > From: Barry Song <v-songbaohua@xxxxxxxx> > > > > Zhiguo reported that swap release could be a serious bottleneck > > during process exits[1]. With mTHP, we have the opportunity to > > batch free swaps. > > Thanks to the work of Chris and Kairui[2], I was able to achieve > > this optimization with minimal code changes by building on their > > efforts. > > If swap_count is 1, which is likely true as most anon memory are > > private, we can free all contiguous swap slots all together. > > > > Ran the below test program for measuring the bandwidth of munmap > > using zRAM and 64KiB mTHP: > > > > #include <sys/mman.h> > > #include <sys/time.h> > > #include <stdlib.h> > > > > unsigned long long tv_to_ms(struct timeval tv) > > { > > return tv.tv_sec * 1000 + tv.tv_usec / 1000; > > } > > > > main() > > { > > struct timeval tv_b, tv_e; > > int i; > > #define SIZE 1024*1024*1024 > > void *p = mmap(NULL, SIZE, PROT_READ | PROT_WRITE, > > MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); > > if (!p) { > > perror("fail to get memory"); > > exit(-1); > > } > > > > madvise(p, SIZE, MADV_HUGEPAGE); > > memset(p, 0x11, SIZE); /* write to get mem */ > > > > madvise(p, SIZE, MADV_PAGEOUT); > > > > gettimeofday(&tv_b, NULL); > > munmap(p, SIZE); > > gettimeofday(&tv_e, NULL); > > > > printf("munmap in bandwidth: %ld bytes/ms\n", > > SIZE/(tv_to_ms(tv_e) - tv_to_ms(tv_b))); > > } > > > > The result is as below (munmap bandwidth): > > mm-unstable mm-unstable-with-patch > > round1 21053761 63161283 > > round2 21053761 63161283 > > round3 21053761 63161283 > > round4 20648881 67108864 > > round5 20648881 67108864 > > > > munmap bandwidth becomes 3X faster. > > Hi Barry, > > Thanks for the patch, I also noticed this could be optimized when > working on the batch freeing of mthp pages in the series you > mentioned, a very nice improvement. > > > > > [1] https://lore.kernel.org/linux-mm/20240731133318.527-1-justinjiang@xxxxxxxx/ > > [2] https://lore.kernel.org/linux-mm/20240730-swap-allocator-v5-0-cb9c148b9297@xxxxxxxxxx/ > > > > Cc: Kairui Song <kasong@xxxxxxxxxxx> > > Cc: Chris Li <chrisl@xxxxxxxxxx> > > Cc: "Huang, Ying" <ying.huang@xxxxxxxxx> > > Cc: Hugh Dickins <hughd@xxxxxxxxxx> > > Cc: Kalesh Singh <kaleshsingh@xxxxxxxxxx> > > Cc: Ryan Roberts <ryan.roberts@xxxxxxx> > > Cc: David Hildenbrand <david@xxxxxxxxxx> > > Signed-off-by: Barry Song <v-songbaohua@xxxxxxxx> > > --- > > mm/swapfile.c | 78 +++++++++++++++++++++++++++++++++++++++++++-------- > > 1 file changed, 67 insertions(+), 11 deletions(-) > > > > diff --git a/mm/swapfile.c b/mm/swapfile.c > > index 35cb58373493..25c3f98fa8d5 100644 > > --- a/mm/swapfile.c > > +++ b/mm/swapfile.c > > @@ -156,6 +156,25 @@ static bool swap_is_has_cache(struct swap_info_struct *si, > > return true; > > } > > > > +static bool swap_is_last_map(struct swap_info_struct *si, > > + unsigned long offset, int nr_pages, > > + bool *has_cache) > > +{ > > + unsigned char *map = si->swap_map + offset; > > + unsigned char *map_end = map + nr_pages; > > + bool cached = false; > > + > > + do { > > + if ((*map & ~SWAP_HAS_CACHE) != 1) > > + return false; > > I haven't tried this yet, but looking at this if. If a mthp or thp was > split, and part of the slots are "1", rest of the slots are "HAS_CACHE > | 1", this will also return true, is this a problem? > > These slots with "1" don't have an entry in the swap cache, so the > following reclaim in free_swap_and_cache_nr might not work as > expected, could they be stuck in HAS_CACHE state? good catch. Kairui, Thanks! In this case, I am leaking swap slots whose count == 1. The original non-batch code will still free those slots in non-batched mode but my code will just write them to SWAP_HAS_CACHE. this can be fixed by: static bool swap_is_last_map(struct swap_info_struct *si, unsigned long offset, int nr_pages, bool *has_cache) { unsigned char *map = si->swap_map + offset; unsigned char *map_end = map + nr_pages; unsigned char count = *map; if (swap_count(count) != 1) return false; do { if (*map != count) return false; } while (++map < map_end); *has_cache = !!(count & SWAP_HAS_CACHE); return true; } Thanks Barry