Johannes Weiner <hannes@xxxxxxxxxxx> writes: > On Tue, Mar 28, 2017 at 01:32:09PM +0800, Huang, Ying wrote: >> @@ -183,12 +184,53 @@ void __delete_from_swap_cache(struct page *page) >> ADD_CACHE_INFO(del_total, nr); >> } >> >> +#ifdef CONFIG_THP_SWAP_CLUSTER >> +int add_to_swap_trans_huge(struct page *page, struct list_head *list) >> +{ >> + swp_entry_t entry; >> + int ret = 0; >> + >> + /* cannot split, which may be needed during swap in, skip it */ >> + if (!can_split_huge_page(page, NULL)) >> + return -EBUSY; >> + /* fallback to split huge page firstly if no PMD map */ >> + if (!compound_mapcount(page)) >> + return 0; >> + entry = get_huge_swap_page(); >> + if (!entry.val) >> + return 0; >> + if (mem_cgroup_try_charge_swap(page, entry, HPAGE_PMD_NR)) { >> + __swapcache_free(entry, true); >> + return -EOVERFLOW; >> + } >> + ret = add_to_swap_cache(page, entry, >> + __GFP_HIGH | __GFP_NOMEMALLOC|__GFP_NOWARN); >> + /* -ENOMEM radix-tree allocation failure */ >> + if (ret) { >> + __swapcache_free(entry, true); >> + return 0; >> + } >> + ret = split_huge_page_to_list(page, list); >> + if (ret) { >> + delete_from_swap_cache(page); >> + return -EBUSY; >> + } >> + return 1; >> +} >> +#else >> +static inline int add_to_swap_trans_huge(struct page *page, >> + struct list_head *list) >> +{ >> + return 0; >> +} >> +#endif >> + >> /** >> * add_to_swap - allocate swap space for a page >> * @page: page we want to move to swap >> * >> * Allocate swap space for the page and add the page to the >> - * swap cache. Caller needs to hold the page lock. >> + * swap cache. Caller needs to hold the page lock. >> */ >> int add_to_swap(struct page *page, struct list_head *list) >> { >> @@ -198,6 +240,18 @@ int add_to_swap(struct page *page, struct list_head *list) >> VM_BUG_ON_PAGE(!PageLocked(page), page); >> VM_BUG_ON_PAGE(!PageUptodate(page), page); >> >> + if (unlikely(PageTransHuge(page))) { >> + err = add_to_swap_trans_huge(page, list); >> + switch (err) { >> + case 1: >> + return 1; >> + case 0: >> + /* fallback to split firstly if return 0 */ >> + break; >> + default: >> + return 0; >> + } >> + } >> entry = get_swap_page(); >> if (!entry.val) >> return 0; > > add_to_swap_trans_huge() is too close a copy of add_to_swap(), which > makes the code error prone for future modifications to the swap slot > allocation protocol. > > This should read: > > retry: > entry = get_swap_page(page); > if (!entry.val) { > if (PageTransHuge(page)) { > split_huge_page_to_list(page, list); > goto retry; > } > return 0; > } If the swap space is used up, that is, get_swap_page() cannot allocate even 1 swap entry for a normal page. We will split THP unnecessarily with the change, but in the original code, we just skip the THP. There may be a performance regression here. Similar problem exists for mem_cgroup_try_charge_swap() too. If the mem cgroup exceeds the swap limit, the THP will be split unnecessary with the change too. > And get_swap_page(), mem_cgroup_try_charge_swap() etc. should all > check PageTransHuge() instead of having extra parameters or separate > code paths for the huge page case. > > In general, don't try to tack this feature onto the side of the > VM. Because right now, this looks a bit like the hugetlb code, with > one big branch in the beginning that opens up an alternate > reality. Instead, these functions should handle THP all the way down > the stack, and without passing down redundant information. Yes. We should share the code as much as possible. I just have some questions as above. Could you help me on that? Best Regards, Huang, Ying -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>