+ mm-thp-swap-support-pmd-swap-mapping-in-free_swap_and_cache-swap_free.patch added to -mm tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The patch titled
     Subject: mm, THP, swap: support PMD swap mapping in free_swap_and_cache()/swap_free()
has been added to the -mm tree.  Its filename is
     mm-thp-swap-support-pmd-swap-mapping-in-free_swap_and_cache-swap_free.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/mm-thp-swap-support-pmd-swap-mapping-in-free_swap_and_cache-swap_free.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/mm-thp-swap-support-pmd-swap-mapping-in-free_swap_and_cache-swap_free.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Huang Ying <ying.huang@xxxxxxxxx>
Subject: mm, THP, swap: support PMD swap mapping in free_swap_and_cache()/swap_free()

When a PMD swap mapping is removed from a huge swap cluster, for example,
unmap a memory range mapped with PMD swap mapping, etc,
free_swap_and_cache() will be called to decrease the reference count to
the huge swap cluster.  free_swap_and_cache() may also free or split the
huge swap cluster, and free the corresponding THP in swap cache if
necessary.  swap_free() is similar, and shares most implementation with
free_swap_and_cache().  This patch revises free_swap_and_cache() and
swap_free() to implement this.

If the swap cluster has been split already, for example, because of
failing to allocate a THP during swapin, we just decrease one from the
reference count of all swap slots.

Otherwise, we will decrease one from the reference count of all swap slots
and the PMD swap mapping count in cluster_count().  When the corresponding
THP isn't in swap cache, if PMD swap mapping count becomes 0, the huge
swap cluster will be split, and if all swap count becomes 0, the huge swap
cluster will be freed.  When the corresponding THP is in swap cache, if
every swap_map[offset] == SWAP_HAS_CACHE, we will try to delete the THP
from swap cache.  Which will cause the THP and the huge swap cluster be
freed.

Link: http://lkml.kernel.org/r/20180622035151.6676-6-ying.huang@xxxxxxxxx
Signed-off-by: "Huang, Ying" <ying.huang@xxxxxxxxx>
Cc: "Kirill A. Shutemov" <kirill.shutemov@xxxxxxxxxxxxxxx>
Cc: Andrea Arcangeli <aarcange@xxxxxxxxxx>
Cc: Michal Hocko <mhocko@xxxxxxxx>
Cc: Johannes Weiner <hannes@xxxxxxxxxxx>
Cc: Shaohua Li <shli@xxxxxxxxxx>
Cc: Hugh Dickins <hughd@xxxxxxxxxx>
Cc: Minchan Kim <minchan@xxxxxxxxxx>
Cc: Rik van Riel <riel@xxxxxxxxxx>
Cc: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>
Cc: Naoya Horiguchi <n-horiguchi@xxxxxxxxxxxxx>
Cc: Zi Yan <zi.yan@xxxxxxxxxxxxxx>
Cc: Daniel Jordan <daniel.m.jordan@xxxxxxxxxx>
Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
---


diff -puN arch/s390/mm/pgtable.c~mm-thp-swap-support-pmd-swap-mapping-in-free_swap_and_cache-swap_free arch/s390/mm/pgtable.c
--- a/arch/s390/mm/pgtable.c~mm-thp-swap-support-pmd-swap-mapping-in-free_swap_and_cache-swap_free
+++ a/arch/s390/mm/pgtable.c
@@ -646,7 +646,7 @@ static void ptep_zap_swap_entry(struct m
 
 		dec_mm_counter(mm, mm_counter(page));
 	}
-	free_swap_and_cache(entry);
+	free_swap_and_cache(entry, false);
 }
 
 void ptep_zap_unused(struct mm_struct *mm, unsigned long addr,
diff -puN include/linux/swap.h~mm-thp-swap-support-pmd-swap-mapping-in-free_swap_and_cache-swap_free include/linux/swap.h
--- a/include/linux/swap.h~mm-thp-swap-support-pmd-swap-mapping-in-free_swap_and_cache-swap_free
+++ a/include/linux/swap.h
@@ -453,9 +453,9 @@ extern int add_swap_count_continuation(s
 extern void swap_shmem_alloc(swp_entry_t);
 extern int swap_duplicate(swp_entry_t *entry, bool cluster);
 extern int swapcache_prepare(swp_entry_t entry, bool cluster);
-extern void swap_free(swp_entry_t);
+extern void swap_free(swp_entry_t entry, bool cluster);
 extern void swapcache_free_entries(swp_entry_t *entries, int n);
-extern int free_swap_and_cache(swp_entry_t);
+extern int free_swap_and_cache(swp_entry_t entry, bool cluster);
 extern int swap_type_of(dev_t, sector_t, struct block_device **);
 extern unsigned int count_swap_pages(int, int);
 extern sector_t map_swap_page(struct page *, struct block_device **);
@@ -509,7 +509,8 @@ static inline void show_swap_cache_info(
 {
 }
 
-#define free_swap_and_cache(e) ({(is_migration_entry(e) || is_device_private_entry(e));})
+#define free_swap_and_cache(e, c)					\
+	({(is_migration_entry(e) || is_device_private_entry(e)); })
 #define swapcache_prepare(e, c)						\
 	({(is_migration_entry(e) || is_device_private_entry(e)); })
 
@@ -527,7 +528,7 @@ static inline int swap_duplicate(swp_ent
 	return 0;
 }
 
-static inline void swap_free(swp_entry_t swp)
+static inline void swap_free(swp_entry_t swp, bool cluster)
 {
 }
 
diff -puN kernel/power/swap.c~mm-thp-swap-support-pmd-swap-mapping-in-free_swap_and_cache-swap_free kernel/power/swap.c
--- a/kernel/power/swap.c~mm-thp-swap-support-pmd-swap-mapping-in-free_swap_and_cache-swap_free
+++ a/kernel/power/swap.c
@@ -182,7 +182,7 @@ sector_t alloc_swapdev_block(int swap)
 	offset = swp_offset(get_swap_page_of_type(swap));
 	if (offset) {
 		if (swsusp_extents_insert(offset))
-			swap_free(swp_entry(swap, offset));
+			swap_free(swp_entry(swap, offset), false);
 		else
 			return swapdev_block(swap, offset);
 	}
@@ -206,7 +206,7 @@ void free_all_swap_pages(int swap)
 		ext = rb_entry(node, struct swsusp_extent, node);
 		rb_erase(node, &swsusp_extents);
 		for (offset = ext->start; offset <= ext->end; offset++)
-			swap_free(swp_entry(swap, offset));
+			swap_free(swp_entry(swap, offset), false);
 
 		kfree(ext);
 	}
diff -puN mm/madvise.c~mm-thp-swap-support-pmd-swap-mapping-in-free_swap_and_cache-swap_free mm/madvise.c
--- a/mm/madvise.c~mm-thp-swap-support-pmd-swap-mapping-in-free_swap_and_cache-swap_free
+++ a/mm/madvise.c
@@ -349,7 +349,7 @@ static int madvise_free_pte_range(pmd_t
 			if (non_swap_entry(entry))
 				continue;
 			nr_swap--;
-			free_swap_and_cache(entry);
+			free_swap_and_cache(entry, false);
 			pte_clear_not_present_full(mm, addr, pte, tlb->fullmm);
 			continue;
 		}
diff -puN mm/memory.c~mm-thp-swap-support-pmd-swap-mapping-in-free_swap_and_cache-swap_free mm/memory.c
--- a/mm/memory.c~mm-thp-swap-support-pmd-swap-mapping-in-free_swap_and_cache-swap_free
+++ a/mm/memory.c
@@ -1382,7 +1382,7 @@ again:
 			page = migration_entry_to_page(entry);
 			rss[mm_counter(page)]--;
 		}
-		if (unlikely(!free_swap_and_cache(entry)))
+		if (unlikely(!free_swap_and_cache(entry, false)))
 			print_bad_pte(vma, addr, ptent, NULL);
 		pte_clear_not_present_full(mm, addr, pte, tlb->fullmm);
 	} while (pte++, addr += PAGE_SIZE, addr != end);
@@ -3065,7 +3065,7 @@ int do_swap_page(struct vm_fault *vmf)
 		activate_page(page);
 	}
 
-	swap_free(entry);
+	swap_free(entry, false);
 	if (mem_cgroup_swap_full(page) ||
 	    (vma->vm_flags & VM_LOCKED) || PageMlocked(page))
 		try_to_free_swap(page);
diff -puN mm/shmem.c~mm-thp-swap-support-pmd-swap-mapping-in-free_swap_and_cache-swap_free mm/shmem.c
--- a/mm/shmem.c~mm-thp-swap-support-pmd-swap-mapping-in-free_swap_and_cache-swap_free
+++ a/mm/shmem.c
@@ -677,7 +677,7 @@ static int shmem_free_swap(struct addres
 	xa_unlock_irq(&mapping->i_pages);
 	if (old != radswap)
 		return -ENOENT;
-	free_swap_and_cache(radix_to_swp_entry(radswap));
+	free_swap_and_cache(radix_to_swp_entry(radswap), false);
 	return 0;
 }
 
@@ -1212,7 +1212,7 @@ static int shmem_unuse_inode(struct shme
 			spin_lock_irq(&info->lock);
 			info->swapped--;
 			spin_unlock_irq(&info->lock);
-			swap_free(swap);
+			swap_free(swap, false);
 		}
 	}
 	return error;
@@ -1750,7 +1750,7 @@ repeat:
 
 		delete_from_swap_cache(page);
 		set_page_dirty(page);
-		swap_free(swap);
+		swap_free(swap, false);
 
 	} else {
 		if (vma && userfaultfd_missing(vma)) {
diff -puN mm/swapfile.c~mm-thp-swap-support-pmd-swap-mapping-in-free_swap_and_cache-swap_free mm/swapfile.c
--- a/mm/swapfile.c~mm-thp-swap-support-pmd-swap-mapping-in-free_swap_and_cache-swap_free
+++ a/mm/swapfile.c
@@ -885,7 +885,7 @@ no_page:
 }
 
 #ifdef CONFIG_THP_SWAP
-static int swap_alloc_cluster(struct swap_info_struct *si, swp_entry_t *slot)
+static int __swap_alloc_cluster(struct swap_info_struct *si, swp_entry_t *slot)
 {
 	unsigned long idx;
 	struct swap_cluster_info *ci;
@@ -911,7 +911,7 @@ static int swap_alloc_cluster(struct swa
 	return 1;
 }
 
-static void swap_free_cluster(struct swap_info_struct *si, unsigned long idx)
+static void __swap_free_cluster(struct swap_info_struct *si, unsigned long idx)
 {
 	unsigned long offset = idx * SWAPFILE_CLUSTER;
 	struct swap_cluster_info *ci;
@@ -924,7 +924,7 @@ static void swap_free_cluster(struct swa
 	swap_range_free(si, offset, SWAPFILE_CLUSTER);
 }
 #else
-static int swap_alloc_cluster(struct swap_info_struct *si, swp_entry_t *slot)
+static int __swap_alloc_cluster(struct swap_info_struct *si, swp_entry_t *slot)
 {
 	VM_WARN_ON_ONCE(1);
 	return 0;
@@ -996,7 +996,7 @@ start_over:
 		}
 		if (cluster) {
 			if (!(si->flags & SWP_FILE))
-				n_ret = swap_alloc_cluster(si, swp_entries);
+				n_ret = __swap_alloc_cluster(si, swp_entries);
 		} else
 			n_ret = scan_swap_map_slots(si, SWAP_HAS_CACHE,
 						    n_goal, swp_entries);
@@ -1215,8 +1215,10 @@ static unsigned char __swap_entry_free_l
 				count = SWAP_MAP_MAX | COUNT_CONTINUED;
 			else
 				count = SWAP_MAP_MAX;
-		} else
+		} else {
+			VM_BUG_ON(!count);
 			count--;
+		}
 	}
 
 	usage = count | has_cache;
@@ -1255,17 +1257,90 @@ static void swap_entry_free(struct swap_
 	swap_range_free(p, offset, 1);
 }
 
+#ifdef CONFIG_THP_SWAP
+static unsigned char swap_free_cluster(struct swap_info_struct *si,
+				       swp_entry_t entry)
+{
+	struct swap_cluster_info *ci;
+	unsigned long offset = swp_offset(entry);
+	unsigned int count, i, free_entries = 0, cache_only = 0;
+	unsigned char *map, ret = 1;
+
+	ci = lock_cluster(si, offset);
+	VM_BUG_ON(!is_cluster_offset(offset));
+	/* Cluster has been split, free each swap entries in cluster */
+	if (!cluster_is_huge(ci)) {
+		unlock_cluster(ci);
+		for (i = 0; i < SWAPFILE_CLUSTER; i++, entry.val++) {
+			if (!__swap_entry_free(si, entry, 1)) {
+				free_entries++;
+				free_swap_slot(entry);
+			}
+		}
+		return !(free_entries == SWAPFILE_CLUSTER);
+	}
+	count = cluster_count(ci) - 1;
+	VM_BUG_ON(count < SWAPFILE_CLUSTER);
+	cluster_set_count(ci, count);
+	map = si->swap_map + offset;
+	for (i = 0; i < SWAPFILE_CLUSTER; i++) {
+		if (map[i] == 1) {
+			map[i] = SWAP_MAP_BAD;
+			free_entries++;
+		} else if (__swap_entry_free_locked(si, ci, offset + i, 1) ==
+			   SWAP_HAS_CACHE)
+			cache_only++;
+	}
+	VM_BUG_ON(free_entries && (count != SWAPFILE_CLUSTER ||
+				   (map[0] & SWAP_HAS_CACHE)));
+	if (free_entries == SWAPFILE_CLUSTER)
+		memset(map, SWAP_HAS_CACHE, SWAPFILE_CLUSTER);
+	else if (!cluster_swapcount(ci) && !(map[0] & SWAP_HAS_CACHE))
+		cluster_clear_huge(ci);
+	unlock_cluster(ci);
+	if (free_entries == SWAPFILE_CLUSTER) {
+		spin_lock(&si->lock);
+		mem_cgroup_uncharge_swap(entry, SWAPFILE_CLUSTER);
+		__swap_free_cluster(si, offset / SWAPFILE_CLUSTER);
+		spin_unlock(&si->lock);
+		ret = 0;
+	} else if (free_entries) {
+		ci = lock_cluster(si, offset);
+		for (i = 0; i < SWAPFILE_CLUSTER; i++, entry.val++) {
+			if (map[i] == SWAP_MAP_BAD) {
+				map[i] = SWAP_HAS_CACHE;
+				unlock_cluster(ci);
+				free_swap_slot(entry);
+				ci = lock_cluster(si, offset);
+			}
+		}
+		unlock_cluster(ci);
+	} else if (cache_only == SWAPFILE_CLUSTER)
+		ret = SWAP_HAS_CACHE;
+
+	return ret;
+}
+#else
+static inline unsigned char swap_free_cluster(struct swap_info_struct *si,
+					      swp_entry_t entry)
+{
+	return 0;
+}
+#endif
+
 /*
  * Caller has made sure that the swap device corresponding to entry
  * is still around or has not been recycled.
  */
-void swap_free(swp_entry_t entry)
+void swap_free(swp_entry_t entry, bool cluster)
 {
 	struct swap_info_struct *p;
 
 	p = _swap_info_get(entry);
 	if (p) {
-		if (!__swap_entry_free(p, entry, 1))
+		if (thp_swap_supported() && cluster)
+			swap_free_cluster(p, entry);
+		else if (!__swap_entry_free(p, entry, 1))
 			free_swap_slot(entry);
 	}
 }
@@ -1326,7 +1401,7 @@ static void swapcache_free_cluster(swp_e
 	if (free_entries == SWAPFILE_CLUSTER) {
 		spin_lock(&si->lock);
 		mem_cgroup_uncharge_swap(entry, SWAPFILE_CLUSTER);
-		swap_free_cluster(si, idx);
+		__swap_free_cluster(si, idx);
 		spin_unlock(&si->lock);
 	} else if (free_entries) {
 		for (i = 0; i < SWAPFILE_CLUSTER; i++, entry.val++) {
@@ -1730,7 +1805,7 @@ int try_to_free_swap(struct page *page)
  * Free the swap entry like above, but also try to
  * free the page cache entry if it is the last user.
  */
-int free_swap_and_cache(swp_entry_t entry)
+int free_swap_and_cache(swp_entry_t entry, bool cluster)
 {
 	struct swap_info_struct *p;
 	struct page *page = NULL;
@@ -1741,7 +1816,8 @@ int free_swap_and_cache(swp_entry_t entr
 
 	p = _swap_info_get(entry);
 	if (p) {
-		count = __swap_entry_free(p, entry, 1);
+		count = cluster ? swap_free_cluster(p, entry) :
+			__swap_entry_free(p, entry, 1);
 		if (count == SWAP_HAS_CACHE &&
 		    !swap_page_trans_huge_swapped(p, entry)) {
 			page = find_get_page(swap_address_space(entry),
@@ -1750,7 +1826,7 @@ int free_swap_and_cache(swp_entry_t entr
 				put_page(page);
 				page = NULL;
 			}
-		} else if (!count)
+		} else if (!count && !cluster)
 			free_swap_slot(entry);
 	}
 	if (page) {
@@ -1914,7 +1990,7 @@ static int unuse_pte(struct vm_area_stru
 		mem_cgroup_commit_charge(page, memcg, false, false);
 		lru_cache_add_active_or_unevictable(page, vma);
 	}
-	swap_free(entry);
+	swap_free(entry, false);
 	/*
 	 * Move the page to the active list so it is not
 	 * immediately swapped out again after swapon.
@@ -2353,6 +2429,16 @@ int try_to_unuse(unsigned int type, bool
 	}
 
 	mmput(start_mm);
+
+	/*
+	 * Swap entries may be marked as SWAP_MAP_BAD temporarily in
+	 * swap_free_cluster() before being freed really.
+	 * find_next_to_unuse() will skip these swap entries, that is
+	 * OK.  But we need to wait until they are freed really.
+	 */
+	while (!retval && READ_ONCE(si->inuse_pages))
+		schedule_timeout_uninterruptible(1);
+
 	return retval;
 }
 
_

Patches currently in -mm which might be from ying.huang@xxxxxxxxx are

mm-clear_huge_page-move-order-algorithm-into-a-separate-function.patch
mm-huge-page-copy-target-sub-page-last-when-copy-huge-page.patch
mm-hugetlbfs-rename-address-to-haddr-in-hugetlb_cow.patch
mm-hugetlbfs-pass-fault-address-to-cow-handler.patch
mm-swap-fix-race-between-swapoff-and-some-swap-operations.patch
mm-swap-fix-race-between-swapoff-and-some-swap-operations-v6.patch
mm-fix-race-between-swapoff-and-mincore.patch
mm-thp-swap-enable-pmd-swap-operations-for-config_thp_swap.patch
mm-thp-swap-make-config_thp_swap-depends-on-config_swap.patch
mm-thp-swap-support-pmd-swap-mapping-in-swap_duplicate.patch
mm-thp-swap-support-pmd-swap-mapping-in-swapcache_free_cluster.patch
mm-thp-swap-support-pmd-swap-mapping-in-free_swap_and_cache-swap_free.patch
mm-thp-swap-support-pmd-swap-mapping-when-splitting-huge-pmd.patch
mm-thp-swap-support-pmd-swap-mapping-in-split_swap_cluster.patch
mm-thp-swap-support-to-read-a-huge-swap-cluster-for-swapin-a-thp.patch
mm-thp-swap-swapin-a-thp-as-a-whole.patch
mm-thp-swap-support-to-count-thp-swapin-and-its-fallback.patch
mm-thp-swap-add-sysfs-interface-to-configure-thp-swapin.patch
mm-thp-swap-support-pmd-swap-mapping-in-swapoff.patch
mm-thp-swap-support-pmd-swap-mapping-in-madvise_free.patch
mm-cgroup-thp-swap-support-to-move-swap-account-for-pmd-swap-mapping.patch
mm-thp-swap-support-to-copy-pmd-swap-mapping-when-fork.patch
mm-thp-swap-free-pmd-swap-mapping-when-zap_huge_pmd.patch
mm-thp-swap-support-pmd-swap-mapping-for-madv_willneed.patch
mm-thp-swap-support-pmd-swap-mapping-in-mincore.patch
mm-thp-swap-support-pmd-swap-mapping-in-common-path.patch
mm-thp-swap-create-pmd-swap-mapping-when-unmap-the-thp.patch
mm-thp-avoid-to-split-thp-when-reclaim-madv_free-thp.patch

--
To unsubscribe from this list: send the line "unsubscribe mm-commits" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Kernel Archive]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]

  Powered by Linux