Re: [HMM 00/15] HMM (Heterogeneous Memory Management) v22

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jun 01, 2017 at 12:04:02PM +1000, Balbir Singh wrote:
> On Thu, May 25, 2017 at 3:53 AM, Jerome Glisse <jglisse@xxxxxxxxxx> wrote:
> > On Wed, May 24, 2017 at 11:55:12AM +1000, Balbir Singh wrote:
> >> On Tue, May 23, 2017 at 2:51 AM, Jérôme Glisse <jglisse@xxxxxxxxxx> wrote:
> >> > Patchset is on top of mmotm mmotm-2017-05-18, git branch:
> >> >
> >> > https://cgit.freedesktop.org/~glisse/linux/log/?h=hmm-v22
> >> >
> >> > Change since v21 is adding back special refcounting in put_page() to
> >> > catch when a ZONE_DEVICE page is free (refcount going from 2 to 1
> >> > unlike regular page where a refcount of 0 means the page is free).
> >> > See patch 8 of this serie for this refcounting. I did not use static
> >> > keys because it kind of scares me to do that for an inline function.
> >> > If people strongly feel about this i can try to make static key works
> >> > here. Kirill will most likely want to review this.
> >> >
> >> >
> >> > Everything else is the same. Below is the long description of what HMM
> >> > is about and why. At the end of this email i describe briefly each patch
> >> > and suggest reviewers for each of them.
> >> >
> >> >
> >> > Heterogeneous Memory Management (HMM) (description and justification)
> >> >
> >>
> >> Thanks for the patches! These patches are very helpful. There are a
> >> few additional things we would need on top of this (once HMM the base
> >> is merged)
> >>
> >> 1. Support for other architectures, we'd like to make sure we can get
> >> this working for powerpc for example. As a first step we have
> >> ZONE_DEVICE enablement patches, but I think we need some additional
> >> patches for iomem space searching and memory hotplug, IIRC
> >> 2. HMM-CDM and physical address based migration bits. In a recent RFC
> >> we decided to try and use the HMM CDM route as a route to implementing
> >> coherent device memory as a starting point. It would be nice to have
> >> those patches on top of these once these make it to mm -
> >> https://lwn.net/Articles/720380/
> >>
> >
> > I intend to post the updated HMM CDM patchset early next week. I am
> > tie in couple internal backport but i should be able to resume work
> > on that this week.
> >
> 
> Thanks, I am looking at the HMM CDM branch and trying to forward port
> and see what the results look like on top of HMM-v23. Do we have a timeline
> for the v23 merge?
> 

So i am moving to new office and it has taken me more time than i thought
to pack stuff. Attach is first step of CDM on top of lastest HMM. I hope
to have more time tomorrow or next week to finish rebasing patches and to
run some test with stolen ram as CDM memory.

Jérôme
>From 0ca0ebe4aecedfe69ae029c529045d609352b921 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= <jglisse@xxxxxxxxxx>
Date: Thu, 1 Jun 2017 11:25:59 -0400
Subject: [PATCH] mm/device-public-memory: device memory cache coherent with
 CPU
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Platform with advance system bus (like CAPI or CCIX) allow device
memory to be accessible from CPU in a cache coherent fashion. Add
a new type of ZONE_DEVICE to represent such memory. The use case
are the same as for the un-addressable device memory but without
all the corners cases.

Signed-off-by: Jérôme Glisse <jglisse@xxxxxxxxxx>
---
 include/linux/ioport.h   |  1 +
 include/linux/memremap.h | 21 +++++++++++++++++++++
 mm/Kconfig               | 13 +++++++++++++
 mm/memory.c              | 13 +++++++++++++
 mm/migrate.c             | 23 ++++++++++++++---------
 5 files changed, 62 insertions(+), 9 deletions(-)

diff --git a/include/linux/ioport.h b/include/linux/ioport.h
index 3a4f691..f5cf32e 100644
--- a/include/linux/ioport.h
+++ b/include/linux/ioport.h
@@ -131,6 +131,7 @@ enum {
 	IORES_DESC_PERSISTENT_MEMORY		= 4,
 	IORES_DESC_PERSISTENT_MEMORY_LEGACY	= 5,
 	IORES_DESC_DEVICE_PRIVATE_MEMORY	= 6,
+	IORES_DESC_DEVICE_PUBLIC_MEMORY		= 7,
 };
 
 /* helpers to define resources */
diff --git a/include/linux/memremap.h b/include/linux/memremap.h
index 0e0d2e6..b9f460a 100644
--- a/include/linux/memremap.h
+++ b/include/linux/memremap.h
@@ -56,10 +56,18 @@ static inline struct vmem_altmap *to_vmem_altmap(unsigned long memmap_start)
  * page must be treated as an opaque object, rather than a "normal" struct page.
  * A more complete discussion of unaddressable memory may be found in
  * include/linux/hmm.h and Documentation/vm/hmm.txt.
+ *
+ * MEMORY_DEVICE_PUBLIC:
+ * Device memory that is cache coherent from device and CPU point of view. This
+ * is use on platform that have an advance system bus (like CAPI or CCIX). A
+ * driver can hotplug the device memory using ZONE_DEVICE and with that memory
+ * type. Any page of a process can be migrated to such memory. However no one
+ * should be allow to pin such memory so that it can always be evicted.
  */
 enum memory_type {
 	MEMORY_DEVICE_PUBLIC = 0,
 	MEMORY_DEVICE_PRIVATE,
+	MEMORY_DEVICE_PUBLIC,
 };
 
 /*
@@ -91,6 +99,8 @@ enum memory_type {
  * The page_free() callback is called once the page refcount reaches 1
  * (ZONE_DEVICE pages never reach 0 refcount unless there is a refcount bug.
  * This allows the device driver to implement its own memory management.)
+ *
+ * For MEMORY_DEVICE_CACHE_COHERENT only the page_free() callback matter.
  */
 typedef int (*dev_page_fault_t)(struct vm_area_struct *vma,
 				unsigned long addr,
@@ -133,6 +143,12 @@ static inline bool is_device_private_page(const struct page *page)
 	return is_zone_device_page(page) &&
 		page->pgmap->type == MEMORY_DEVICE_PRIVATE;
 }
+
+static inline bool is_device_public_page(const struct page *page)
+{
+	return is_zone_device_page(page) &&
+		page->pgmap->type == MEMORY_DEVICE_PUBLIC;
+}
 #else
 static inline void *devm_memremap_pages(struct device *dev,
 		struct resource *res, struct percpu_ref *ref,
@@ -156,6 +172,11 @@ static inline bool is_device_private_page(const struct page *page)
 {
 	return false;
 }
+
+static inline bool is_device_public_page(const struct page *page)
+{
+	return false;
+}
 #endif
 
 /**
diff --git a/mm/Kconfig b/mm/Kconfig
index 46296d5d7..bacb193 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -758,6 +758,19 @@ config DEVICE_PRIVATE
 	  memory; i.e., memory that is only accessible from the device (or
 	  group of devices).
 
+config DEVICE_PUBLIC
+	bool "Unaddressable device memory (GPU memory, ...)"
+	depends on X86_64
+	depends on ZONE_DEVICE
+	depends on MEMORY_HOTPLUG
+	depends on MEMORY_HOTREMOVE
+	depends on SPARSEMEM_VMEMMAP
+
+	help
+	  Allows creation of struct pages to represent addressable device
+	  memory; i.e., memory that is accessible from both the device and
+	  the CPU
+
 config FRAME_VECTOR
 	bool
 
diff --git a/mm/memory.c b/mm/memory.c
index eba61dd..d192f3d 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -983,6 +983,19 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 		get_page(page);
 		page_dup_rmap(page, false);
 		rss[mm_counter(page)]++;
+	} else if (pte_devmap(pte)) {
+		page = pte_page(pte);
+
+		/*
+		 * Cache coherent device memory behave like regular page and
+		 * not like persistent memory page. For more informations see
+		 * MEMORY_DEVICE_CACHE_COHERENT in memory_hotplug.h
+		 */
+		if (is_device_public_page(page)) {
+			get_page(page);
+			page_dup_rmap(page, false);
+			rss[mm_counter(page)]++;
+		}
 	}
 
 out_set_pte:
diff --git a/mm/migrate.c b/mm/migrate.c
index d7c4db6..a0115b8 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -229,12 +229,16 @@ static bool remove_migration_pte(struct page *page, struct vm_area_struct *vma,
 		if (is_write_migration_entry(entry))
 			pte = maybe_mkwrite(pte, vma);
 
-		if (unlikely(is_zone_device_page(new)) &&
-		    is_device_private_page(new)) {
-			entry = make_device_private_entry(new, pte_write(pte));
-			pte = swp_entry_to_pte(entry);
-			if (pte_swp_soft_dirty(*pvmw.pte))
-				pte = pte_mksoft_dirty(pte);
+		if (unlikely(is_zone_device_page(new))) {
+			if (is_device_private_page(new)) {
+				entry = make_device_private_entry(new, pte_write(pte));
+				pte = swp_entry_to_pte(entry);
+				if (pte_swp_soft_dirty(*pvmw.pte))
+					pte = pte_mksoft_dirty(pte);
+			} else if (is_device_public_page(new)) {
+				pte = pte_mkdevmap(pte);
+				flush_dcache_page(new);
+			}
 		} else
 			flush_dcache_page(new);
 
@@ -2300,9 +2304,10 @@ static bool migrate_vma_check_page(struct page *page)
 
 	/* Page from ZONE_DEVICE have one extra reference */
 	if (is_zone_device_page(page)) {
-		if (is_device_private_page(page)) {
+		if (is_device_private_page(page) ||
+		    is_device_public_page)
 			extra++;
-		} else
+		else
 			/* Other ZONE_DEVICE memory type are not supported */
 			return false;
 	}
@@ -2621,7 +2626,7 @@ static void migrate_vma_pages(struct migrate_vma *migrate)
 					migrate->src[i] &= ~MIGRATE_PFN_MIGRATE;
 					continue;
 				}
-			} else {
+			} else if (!is_device_public_page(newpage)) {
 				/*
 				 * Other types of ZONE_DEVICE page are not
 				 * supported.
-- 
2.4.11


[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]
  Powered by Linux