On Wed, 2017-03-08 at 15:26 +0800, Huang, Ying wrote: > From: Huang Ying <ying.huang@xxxxxxxxx> > > A variation of get_swap_page(), get_huge_swap_page(), is added to > allocate a swap cluster (HPAGE_PMD_NR swap slots) based on the swap > cluster allocation function. A fair simple algorithm is used, that is, > only the first swap device in priority list will be tried to allocate > the swap cluster. The function will fail if the trying is not > successful, and the caller will fallback to allocate a single swap slot > instead. This works good enough for normal cases. > > This will be used for the THP (Transparent Huge Page) swap support. > Where get_huge_swap_page() will be used to allocate one swap cluster for > each THP swapped out. > > Because of the algorithm adopted, if the difference of the number of the > free swap clusters among multiple swap devices is significant, it is > possible that some THPs are split earlier than necessary. For example, > this could be caused by big size difference among multiple swap devices. > > Cc: Andrea Arcangeli <aarcange@xxxxxxxxxx> > Cc: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx> > Cc: Hugh Dickins <hughd@xxxxxxxxxx> > Cc: Shaohua Li <shli@xxxxxxxxxx> > Cc: Minchan Kim <minchan@xxxxxxxxxx> > Cc: Rik van Riel <riel@xxxxxxxxxx> > Signed-off-by: "Huang, Ying" <ying.huang@xxxxxxxxx> > --- > include/linux/swap.h | 19 ++++++++++++++++++- > mm/swap_slots.c | 5 +++-- > mm/swapfile.c | 16 ++++++++++++---- > 3 files changed, 33 insertions(+), 7 deletions(-) > > diff --git a/include/linux/swap.h b/include/linux/swap.h > index 278e1349a424..e3a7609a8989 100644 > --- a/include/linux/swap.h > +++ b/include/linux/swap.h > @@ -388,7 +388,7 @@ static inline long get_nr_swap_pages(void) > extern void si_swapinfo(struct sysinfo *); > extern swp_entry_t get_swap_page(void); > extern swp_entry_t get_swap_page_of_type(int); > -extern int get_swap_pages(int n, swp_entry_t swp_entries[]); > +extern int get_swap_pages(int n, swp_entry_t swp_entries[], bool huge); > extern int add_swap_count_continuation(swp_entry_t, gfp_t); > extern void swap_shmem_alloc(swp_entry_t); > extern int swap_duplicate(swp_entry_t); > @@ -527,6 +527,23 @@ static inline swp_entry_t get_swap_page(void) > > #endif /* CONFIG_SWAP */ > > +#ifdef CONFIG_THP_SWAP_CLUSTER > +static inline swp_entry_t get_huge_swap_page(void) > +{ > + swp_entry_t entry; > + > + if (get_swap_pages(1, &entry, true)) > + return entry; > + else > + return (swp_entry_t) {0}; > +} > +#else > +static inline swp_entry_t get_huge_swap_page(void) > +{ > + return (swp_entry_t) {0}; > +} > +#endif > + > #ifdef CONFIG_MEMCG > static inline int mem_cgroup_swappiness(struct mem_cgroup *memcg) > { > diff --git a/mm/swap_slots.c b/mm/swap_slots.c > index 9b5bc86f96ad..075bb39e03c5 100644 > --- a/mm/swap_slots.c > +++ b/mm/swap_slots.c > @@ -258,7 +258,8 @@ static int refill_swap_slots_cache(struct swap_slots_cache *cache) > > cache->cur = 0; > if (swap_slot_cache_active) > - cache->nr = get_swap_pages(SWAP_SLOTS_CACHE_SIZE, cache->slots); > + cache->nr = get_swap_pages(SWAP_SLOTS_CACHE_SIZE, cache->slots, > + false); > > return cache->nr; > } > @@ -334,7 +335,7 @@ swp_entry_t get_swap_page(void) > return entry; > } > > - get_swap_pages(1, &entry); > + get_swap_pages(1, &entry, false); > > return entry; > } > diff --git a/mm/swapfile.c b/mm/swapfile.c > index 91876c33114b..7241c937e52b 100644 > --- a/mm/swapfile.c > +++ b/mm/swapfile.c > @@ -904,11 +904,12 @@ static unsigned long scan_swap_map(struct swap_info_struct *si, > > } > > -int get_swap_pages(int n_goal, swp_entry_t swp_entries[]) > +int get_swap_pages(int n_goal, swp_entry_t swp_entries[], bool huge) > { > struct swap_info_struct *si, *next; > long avail_pgs; > int n_ret = 0; > + int nr_pages = huge_cluster_nr_entries(huge); > > avail_pgs = atomic_long_read(&nr_swap_pages); > if (avail_pgs <= 0) > @@ -920,6 +921,10 @@ int get_swap_pages(int n_goal, swp_entry_t swp_entries[]) > if (n_goal > avail_pgs) > n_goal = avail_pgs; > > + n_goal *= nr_pages; I think if (n_goal > 1) when huge is true, n_goal should be set to huge_cluster_nr_entries(huge) here or we could have an invalid check below. We probably should add a comment to get_swap_pages on how we treat n_goal when huge is true. Maybe say we will always treat n_goal as SWAPFILE_CLUSTER when huge is true. > + if (avail_pgs < n_goal) > + goto noswap; > + > atomic_long_sub(n_goal, &nr_swap_pages); > > spin_lock(&swap_avail_lock); > @@ -946,10 +951,13 @@ int get_swap_pages(int n_goal, swp_entry_t swp_entries[]) > spin_unlock(&si->lock); > goto nextsi; > } > - n_ret = scan_swap_map_slots(si, SWAP_HAS_CACHE, > - n_goal, swp_entries); > + if (likely(nr_pages == 1)) if (likely(!huge)) is probably more readable > + n_ret = scan_swap_map_slots(si, SWAP_HAS_CACHE, > + n_goal, swp_entries); > + else > + n_ret = swap_alloc_huge_cluster(si, swp_entries); > spin_unlock(&si->lock); Thanks. Tim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>