On Thu, Jun 16, 2022 at 7:27 PM Miaohe Lin <linmiaohe@xxxxxxxxxx> wrote: > > On 2022/6/16 23:53, Yang Shi wrote: > > On Thu, Jun 16, 2022 at 12:42 AM Miaohe Lin <linmiaohe@xxxxxxxxxx> wrote: > >> > >> On 2022/6/16 7:58, Yang Shi wrote: > >>> On Sat, Jun 11, 2022 at 1:47 AM Miaohe Lin <linmiaohe@xxxxxxxxxx> wrote: > >>>> > >>>> Transhuge swapcaches won't be freed in __collapse_huge_page_copy(). > >>>> It's because release_pte_page() is not called for these pages and > >>>> thus free_page_and_swap_cache can't grab the page lock. These pages > >>>> won't be freed from swap cache even if we are the only user until > >>>> next time reclaim. It shouldn't hurt indeed, but we could try to > >>>> free these pages to save more memory for system. > >>> > >>> > >>>> > >>>> Signed-off-by: Miaohe Lin <linmiaohe@xxxxxxxxxx> > >>>> --- > >>>> include/linux/swap.h | 5 +++++ > >>>> mm/khugepaged.c | 1 + > >>>> mm/swap.h | 5 ----- > >>>> 3 files changed, 6 insertions(+), 5 deletions(-) > >>>> > >>>> diff --git a/include/linux/swap.h b/include/linux/swap.h > >>>> index 8672a7123ccd..ccb83b12b724 100644 > >>>> --- a/include/linux/swap.h > >>>> +++ b/include/linux/swap.h > >>>> @@ -456,6 +456,7 @@ static inline unsigned long total_swapcache_pages(void) > >>>> return global_node_page_state(NR_SWAPCACHE); > >>>> } > >>>> > >>>> +extern void free_swap_cache(struct page *page); > >>>> extern void free_page_and_swap_cache(struct page *); > >>>> extern void free_pages_and_swap_cache(struct page **, int); > >>>> /* linux/mm/swapfile.c */ > >>>> @@ -540,6 +541,10 @@ static inline void put_swap_device(struct swap_info_struct *si) > >>>> /* used to sanity check ptes in zap_pte_range when CONFIG_SWAP=0 */ > >>>> #define free_swap_and_cache(e) is_pfn_swap_entry(e) > >>>> > >>>> +static inline void free_swap_cache(struct page *page) > >>>> +{ > >>>> +} > >>>> + > >>>> static inline int add_swap_count_continuation(swp_entry_t swp, gfp_t gfp_mask) > >>>> { > >>>> return 0; > >>>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c > >>>> index ee0a719c8be9..52109ad13f78 100644 > >>>> --- a/mm/khugepaged.c > >>>> +++ b/mm/khugepaged.c > >>>> @@ -756,6 +756,7 @@ static void __collapse_huge_page_copy(pte_t *pte, struct page *page, > >>>> list_for_each_entry_safe(src_page, tmp, compound_pagelist, lru) { > >>>> list_del(&src_page->lru); > >>>> release_pte_page(src_page); > >>>> + free_swap_cache(src_page); > >>> > >>> Will this really work? The free_swap_cache() will just dec refcounts > >>> without putting the page back to buddy. So the hugepage is not > >>> actually freed at all. Am I missing something? > >> > >> Thanks for catching this! If page is on percpu lru_pvecs cache, page will > >> be released when lru_pvecs are drained. But if not, free_swap_cache() won't > >> free the page as it assumes the caller has a reference on the page and thus > >> only does page_ref_sub(). Does the below change looks sense for you? > > > > THP gets drained immediately so they won't stay in pagevecs. > > Yes, you're right. I missed this. > > > > >> > >> diff --git a/mm/khugepaged.c b/mm/khugepaged.c > >> index 52109ad13f78..b8c96e33591d 100644 > >> --- a/mm/khugepaged.c > >> +++ b/mm/khugepaged.c > >> @@ -755,8 +755,12 @@ static void __collapse_huge_page_copy(pte_t *pte, struct page *page, > >> > >> list_for_each_entry_safe(src_page, tmp, compound_pagelist, lru) { > >> list_del(&src_page->lru); > >> - release_pte_page(src_page); > >> + mod_node_page_state(page_pgdat(src_page), > >> + NR_ISOLATED_ANON + page_is_file_lru(src_page), > >> + -compound_nr(src_page)); > >> + unlock_page(src_page); > >> free_swap_cache(src_page); > >> + putback_lru_page(src_page); > > > > I'm not sure if it is worth it or not for a rare corner case since THP > > should not stay in swapcache unless try_to_unmap() in vmscan fails > > IIUC, even if try_to_unmap() in vmscan succeeds, THP might be still in the > swapcache if shrink_page_list is not called for this THP again after writeback > is done, e.g. when shrink_page_list is called from madvise, so there might be I don't get, doesn't __remove_mapping() delete the page from swap cache? > no memory pressure, or do_swap_page puts the THP into page table again. Also THP do_swap_page() just swaps in base page, never THP. > might not be splited when deferred_split_shrinker is not called, e.g. due to I don't see how deferred split is related to this. > not lacking of memory. Even if there is memory pressure, the THP will stay in > swapcache until next round page reclaim for this THP is done. So there should > be a non-negligible window that THP will stay in the swapcache. > Or am I miss something? I guess you may misunderstand what I meant. This patch is trying to optimize freeing THP in swapcache. But it should be very rare that khugepaged sees THP from swap cache. The only case I could think of is try_to_unmap() in vmscan fails. That might leave THP in swap cache so that khugepaged could see it. > > > IIUC. And it is not guaranteed that free_swap_cache() will get the > > page lock. > > IMHO, we're not guaranteed that free_swap_cache() will get the page lock for the normal > page anyway. > > Thanks! > > > > >> } > >> } > >> > >> Thanks! > >> > >>> > >>>> } > >>>> } > >>>> > >>>> diff --git a/mm/swap.h b/mm/swap.h > >>>> index 0193797b0c92..863f6086c916 100644 > >>>> --- a/mm/swap.h > >>>> +++ b/mm/swap.h > >>>> @@ -41,7 +41,6 @@ void __delete_from_swap_cache(struct page *page, > >>>> void delete_from_swap_cache(struct page *page); > >>>> void clear_shadow_from_swap_cache(int type, unsigned long begin, > >>>> unsigned long end); > >>>> -void free_swap_cache(struct page *page); > >>>> struct page *lookup_swap_cache(swp_entry_t entry, > >>>> struct vm_area_struct *vma, > >>>> unsigned long addr); > >>>> @@ -81,10 +80,6 @@ static inline struct address_space *swap_address_space(swp_entry_t entry) > >>>> return NULL; > >>>> } > >>>> > >>>> -static inline void free_swap_cache(struct page *page) > >>>> -{ > >>>> -} > >>>> - > >>>> static inline void show_swap_cache_info(void) > >>>> { > >>>> } > >>>> -- > >>>> 2.23.0 > >>>> > >>>> > >>> . > >>> > >> > > . > > >